Skip to content
This repository has been archived by the owner on Sep 24, 2024. It is now read-only.

Need help for summarization task for the XSum dataset (Out of range error) #7

Closed
wonjininfo opened this issue May 28, 2020 · 4 comments

Comments

@wonjininfo
Copy link

wonjininfo commented May 28, 2020

Hi,
Congratulations on great work! I appreciate you all for making resources publicly available.

I am currently working on reproducing summarization results using provided checkpoints.
It was very succesfull for the other datasets.
However, I tried the XSum dataset but it ends up with Out of range error.

Currently, for the XSum dataset, TensorFlow (TFDS) requires to manually download and put the preprocessed dataset into a specific location, such as ~/tensorflow_datasets/downloads/manual/xsum-extracts-from-downloads.tar.gz.

Since there are a few issues on downloading XSum dataset from official repository
I had to preprocess the dataset to match the format.
Here is an example of preprocessed data. (11100448.data)

[XSUM]URL[XSUM]
http://web.archive.org/web/20160404221034/http://www.bbc.co.uk/news/entertainment-arts-11100448

[XSUM]INTRODUCTION[XSUM]
Comedian Frankie Boyle has been given his own series on Channel 4 as part of its comedy-heavy autumn 2010 schedule.

[XSUM]RESTBODY[XSUM]
Frankie Boyle's Tramadol Nights is a six-part series described as "no-holds-barred stand up with pre-filmed sketches".
Peep Show also returns for a seventh series, making it the longest running comedy in Channel 4 history.
...

However, I faced Out of range error.
According to the log line, the model was able to find 11334 examples.
(I0526 14:32:49.777998 140570555500352 datasets.py:215] Number of examples for config xsum test is 11334,)

Do you have any idea about solving this error?
Alternatively, I would appreciate it if I could get prediction results (=summarized text) for PEGASUS LARGE (C4) model!!

Thank you very much!
Wonjin


Here is the full-length log (I removed some of unnecessary warning log lines) :
CUDA_VISIBLE_DEVICES=1 python pegasus/bin/evaluate.py --params=xsum_transformer --param_overrides=vocab_filename=ckpt/pegasus_ckpt/c4.unigram.newline.10pct.96000.model,batch_size=1,beam_size=5,beam_alpha=0.6 --model_dir=ckpt/pegasus_ckpt/xsum/model.ckpt-30000  --evaluate_test

WARNING:tensorflow:Estimator's model_fn (<function _estimator_model_fn.<locals>.model_fn at 0x7fd86d117c80>) includes params argument, but params are not passed to Estimator.
W0526 14:32:49.484526 140570555500352 estimator.py:1994] Estimator's model_fn (<function _estimator_model_fn.<locals>.model_fn at 0x7fd86d117c80>) includes params argument, but params are not passed to Estimator.
INFO:tensorflow:Using config: {'_model_dir': 'ckpt/pegasus_ckpt/xsum', '_tf_random_seed': None, '_save_summary_steps': 100, '_save_checkpoints_steps': 1000, '_save_checkpoints_secs': None, '_session_config': allow_soft_placement: true
, '_keep_checkpoint_max': 5, '_keep_checkpoint_every_n_hours': 10000, '_log_step_count_steps': None, '_train_distribute': None, '_device_fn': None, '_protocol': None, '_eval_distribute': None, '_experimental_distribute': None, '_experimental_max_worker_delay_secs': None, '_session_creation_timeout_secs': 7200, '_service': None, '_cluster_spec': <tensorflow.python.training.server_lib.ClusterSpec object at 0x7fd86d116a20>, '_task_type': 'worker', '_task_id': 0, '_global_id_in_cluster': 0, '_master': '', '_evaluation_master': '', '_is_chief': True, '_num_ps_replicas': 0, '_num_worker_replicas': 1, '_tpu_config': TPUConfig(iterations_per_loop=1000, num_shards=None, num_cores_per_replica=None, per_host_input_for_training=2, tpu_job_name=None, initial_infeed_sleep_secs=None, input_partition_dims=None, eval_training_input_configuration=2, experimental_host_call_every_n_steps=1), '_cluster': None}
I0526 14:32:49.485391 140570555500352 estimator.py:212] Using config: {'_model_dir': 'ckpt/pegasus_ckpt/xsum', '_tf_random_seed': None, '_save_summary_steps': 100, '_save_checkpoints_steps': 1000, '_save_checkpoints_secs': None, '_session_config': allow_soft_placement: true
, '_keep_checkpoint_max': 5, '_keep_checkpoint_every_n_hours': 10000, '_log_step_count_steps': None, '_train_distribute': None, '_device_fn': None, '_protocol': None, '_eval_distribute': None, '_experimental_distribute': None, '_experimental_max_worker_delay_secs': None, '_session_creation_timeout_secs': 7200, '_service': None, '_cluster_spec': <tensorflow.python.training.server_lib.ClusterSpec object at 0x7fd86d116a20>, '_task_type': 'worker', '_task_id': 0, '_global_id_in_cluster': 0, '_master': '', '_evaluation_master': '', '_is_chief': True, '_num_ps_replicas': 0, '_num_worker_replicas': 1, '_tpu_config': TPUConfig(iterations_per_loop=1000, num_shards=None, num_cores_per_replica=None, per_host_input_for_training=2, tpu_job_name=None, initial_infeed_sleep_secs=None, input_partition_dims=None, eval_training_input_configuration=2, experimental_host_call_every_n_steps=1), '_cluster': None}
INFO:tensorflow:_TPUContext: eval_on_tpu True
I0526 14:32:49.485875 140570555500352 tpu_context.py:220] _TPUContext: eval_on_tpu True
WARNING:tensorflow:eval_on_tpu ignored because use_tpu is False.
W0526 14:32:49.486188 140570555500352 tpu_context.py:222] eval_on_tpu ignored because use_tpu is False.
I0526 14:32:49.493249 140570555500352 dataset_info.py:358] Load dataset info from /home/wonjin/tensorflow_datasets/xsum/1.1.0
I0526 14:32:49.494374 140570555500352 dataset_builder.py:287] Reusing dataset xsum (/home/wonjin/tensorflow_datasets/xsum/1.1.0)
I0526 14:32:49.494525 140570555500352 dataset_builder.py:499] Constructing tf.data.Dataset for split test, from /home/wonjin/tensorflow_datasets/xsum/1.1.0
I0526 14:32:49.777998 140570555500352 datasets.py:215] Number of examples for config xsum test is 11334
2020-05-26 14:32:50.674068: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcuda.so.1
2020-05-26 14:32:50.707175: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1618] Found device 0 with properties:
name: TITAN RTX major: 7 minor: 5 memoryClockRate(GHz): 1.77
pciBusID: 0000:20:00.0
2020-05-26 14:32:50.707436: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcudart.so.10.0
2020-05-26 14:32:50.708788: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcublas.so.10.0
2020-05-26 14:32:50.709977: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcufft.so.10.0
2020-05-26 14:32:50.710302: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcurand.so.10.0
2020-05-26 14:32:50.711864: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcusolver.so.10.0
2020-05-26 14:32:50.713033: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcusparse.so.10.0
2020-05-26 14:32:50.716500: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcudnn.so.7
2020-05-26 14:32:50.722362: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1746] Adding visible gpu devices: 0
INFO:tensorflow:Calling model_fn.
I0526 14:32:51.179434 140570555500352 estimator.py:1148] Calling model_fn.
INFO:tensorflow:Running infer on CPU
I0526 14:32:51.180392 140570555500352 tpu_estimator.py:3124] Running infer on CPU

INFO:tensorflow:Done calling model_fn.
I0526 14:33:01.511505 140570555500352 estimator.py:1150] Done calling model_fn.
INFO:tensorflow:Graph was finalized.
I0526 14:33:02.609902 140570555500352 monitored_session.py:240] Graph was finalized.
2020-05-26 14:33:02.611567: I tensorflow/core/platform/cpu_feature_guard.cc:142] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 AVX512F FMA
2020-05-26 14:33:02.648825: I tensorflow/core/platform/profile_utils/cpu_utils.cc:94] CPU Frequency: 2200000000 Hz
2020-05-26 14:33:02.652197: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x617d2c0 initialized for platform Host (this does not guarantee that XLA will be used). Devices:
2020-05-26 14:33:02.652242: I tensorflow/compiler/xla/service/service.cc:176]   StreamExecutor device (0): Host, Default Version
2020-05-26 14:33:02.994441: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x4ec4b30 initialized for platform CUDA (this does not guarantee that XLA will be used). Devices:
2020-05-26 14:33:02.994511: I tensorflow/compiler/xla/service/service.cc:176]   StreamExecutor device (0): TITAN RTX, Compute Capability 7.5
2020-05-26 14:33:02.996206: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1618] Found device 0 with properties:
name: TITAN RTX major: 7 minor: 5 memoryClockRate(GHz): 1.77
pciBusID: 0000:20:00.0
2020-05-26 14:33:02.996437: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcudart.so.10.0
2020-05-26 14:33:02.996472: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcublas.so.10.0
2020-05-26 14:33:02.996508: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcufft.so.10.0
2020-05-26 14:33:02.996548: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcurand.so.10.0
2020-05-26 14:33:02.996575: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcusolver.so.10.0
2020-05-26 14:33:02.996619: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcusparse.so.10.0
2020-05-26 14:33:02.996658: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcudnn.so.7
2020-05-26 14:33:02.999220: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1746] Adding visible gpu devices: 0
2020-05-26 14:33:02.999273: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcudart.so.10.0
2020-05-26 14:33:03.001303: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1159] Device interconnect StreamExecutor with strength 1 edge matrix:
2020-05-26 14:33:03.001323: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1165]      0
2020-05-26 14:33:03.001333: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1178] 0:   N
2020-05-26 14:33:03.003746: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1304] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 8384 MB memory) -> physical GPU (device: 0, name: TITAN RTX, pci bus id: 0000:20:00.0, compute capability: 7.5)
INFO:tensorflow:Restoring parameters from ckpt/pegasus_ckpt/xsum/model.ckpt-30000
I0526 14:33:03.009030 140570555500352 saver.py:1284] Restoring parameters from ckpt/pegasus_ckpt/xsum/model.ckpt-30000
2020-05-26 14:33:06.812386: W tensorflow/core/framework/op_kernel.cc:1651] OP_REQUIRES failed at save_restore_v2_ops.cc:184 : Out of range: Read less bytes than requested
ERROR:tensorflow:Error recorded from prediction_loop: 2 root error(s) found.
  (0) Out of range: Read less bytes than requested
         [[node save/RestoreV2 (defined at /home/wonjin/pegasusenv/lib/python3.6/site-packages/tensorflow_core/python/framework/ops.py:1748) ]]
         [[save/RestoreV2/_301]]
  (1) Out of range: Read less bytes than requested
         [[node save/RestoreV2 (defined at /home/wonjin/pegasusenv/lib/python3.6/site-packages/tensorflow_core/python/framework/ops.py:1748) ]]
0 successful operations.
0 derived errors ignored.

Original stack trace for 'save/RestoreV2':
  File "pegasus/bin/evaluate.py", line 153, in <module>
    tf.app.run(main)
  File "/home/wonjin/pegasusenv/lib/python3.6/site-packages/tensorflow_core/python/platform/app.py", line 40, in run
    _run(main=main, argv=argv, flags_parser=_parse_flags_tolerate_undef)
  File "/home/wonjin/pegasusenv/lib/python3.6/site-packages/absl/app.py", line 299, in run
    _run_main(main, args)
  File "/home/wonjin/pegasusenv/lib/python3.6/site-packages/absl/app.py", line 250, in _run_main
    sys.exit(main(argv))
  File "pegasus/bin/evaluate.py", line 144, in main
    FLAGS.enable_logging)
  File "/hdd3/wonjin/pegasus/pegasus/eval/text_eval.py", line 153, in text_eval
    for i, features in enumerate(features_iter):
  File "/home/wonjin/pegasusenv/lib/python3.6/site-packages/tensorflow_estimator/python/estimator/tpu/tpu_estimator.py", line 3072, in predict
    yield_single_examples=yield_single_examples):
  File "/home/wonjin/pegasusenv/lib/python3.6/site-packages/tensorflow_estimator/python/estimator/estimator.py", line 638, in predict
    hooks=all_hooks) as mon_sess:
  File "/home/wonjin/pegasusenv/lib/python3.6/site-packages/tensorflow_core/python/training/monitored_session.py", line 1014, in __init__
    stop_grace_period_secs=stop_grace_period_secs)
  File "/home/wonjin/pegasusenv/lib/python3.6/site-packages/tensorflow_core/python/training/monitored_session.py", line 725, in __init__
    self._sess = _RecoverableSession(self._coordinated_creator)
  File "/home/wonjin/pegasusenv/lib/python3.6/site-packages/tensorflow_core/python/training/monitored_session.py", line 1207, in __init__
    _WrappedSession.__init__(self, self._create_session())
  File "/home/wonjin/pegasusenv/lib/python3.6/site-packages/tensorflow_core/python/training/monitored_session.py", line 1212, in _create_session
    return self._sess_creator.create_session()
  File "/home/wonjin/pegasusenv/lib/python3.6/site-packages/tensorflow_core/python/training/monitored_session.py", line 878, in create_session
    self.tf_sess = self._session_creator.create_session()
  File "/home/wonjin/pegasusenv/lib/python3.6/site-packages/tensorflow_core/python/training/monitored_session.py", line 638, in create_session
    self._scaffold.finalize()
  File "/home/wonjin/pegasusenv/lib/python3.6/site-packages/tensorflow_core/python/training/monitored_session.py", line 229, in finalize
    self._saver = training_saver._get_saver_or_default()  # pylint: disable=protected-access
  File "/home/wonjin/pegasusenv/lib/python3.6/site-packages/tensorflow_core/python/training/saver.py", line 599, in _get_saver_or_default
    saver = Saver(sharded=True, allow_empty=True)
  File "/home/wonjin/pegasusenv/lib/python3.6/site-packages/tensorflow_core/python/training/saver.py", line 828, in __init__
    self.build()
  File "/home/wonjin/pegasusenv/lib/python3.6/site-packages/tensorflow_core/python/training/saver.py", line 840, in build
    self._build(self._filename, build_save=True, build_restore=True)
  File "/home/wonjin/pegasusenv/lib/python3.6/site-packages/tensorflow_core/python/training/saver.py", line 878, in _build
    build_restore=build_restore)
  File "/home/wonjin/pegasusenv/lib/python3.6/site-packages/tensorflow_core/python/training/saver.py", line 502, in _build_internal
    restore_sequentially, reshape)
  File "/home/wonjin/pegasusenv/lib/python3.6/site-packages/tensorflow_core/python/training/saver.py", line 381, in _AddShardedRestoreOps
    name="restore_shard"))
  File "/home/wonjin/pegasusenv/lib/python3.6/site-packages/tensorflow_core/python/training/saver.py", line 328, in _AddRestoreOps
    restore_sequentially)
  File "/home/wonjin/pegasusenv/lib/python3.6/site-packages/tensorflow_core/python/training/saver.py", line 575, in bulk_restore
    return io_ops.restore_v2(filename_tensor, names, slices, dtypes)
  File "/home/wonjin/pegasusenv/lib/python3.6/site-packages/tensorflow_core/python/ops/gen_io_ops.py", line 1696, in restore_v2
    name=name)
  File "/home/wonjin/pegasusenv/lib/python3.6/site-packages/tensorflow_core/python/framework/op_def_library.py", line 794, in _apply_op_helper
    op_def=op_def)
  File "/home/wonjin/pegasusenv/lib/python3.6/site-packages/tensorflow_core/python/util/deprecation.py", line 507, in new_func
    return func(*args, **kwargs)
  File "/home/wonjin/pegasusenv/lib/python3.6/site-packages/tensorflow_core/python/framework/ops.py", line 3357, in create_op
    attrs, op_def, compute_device)
  File "/home/wonjin/pegasusenv/lib/python3.6/site-packages/tensorflow_core/python/framework/ops.py", line 3426, in _create_op_internal
    op_def=op_def)
  File "/home/wonjin/pegasusenv/lib/python3.6/site-packages/tensorflow_core/python/framework/ops.py", line 1748, in __init__
    self._traceback = tf_stack.extract_stack()
@wonjininfo wonjininfo changed the title Need help for summarization result for XSum dataset (Out of range error) Need help for summarization task for the XSum dataset (Out of range error) May 28, 2020
@JingqingZ
Copy link
Contributor

Hi Wonjin, this error seems comes from the xsum checkpoint instead of xsum dataset.

@yaozhaogoogle Could you help have a check the xsum checkpoint released on google cloud bucket? It seems the xsum checkpoint is PEGASUS_BASE rather than PEGASUS_LARGE. Checkpoints of other datasets look correct. Thanks!

@yaozhaogoogle
Copy link
Contributor

There seems to be a data corruption when the xsum ckpt was uploaded, let me re-upload that.

@JingqingZ
Copy link
Contributor

@wonjininfo The new checkpoints have been uploaded. Please download again and have a try. Thanks!

@wonjininfo
Copy link
Author

The new checkpoint works well. Thanks!!

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants