-
Notifications
You must be signed in to change notification settings - Fork 98
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Facing memory exhausted while running inference #33
Comments
Have u checked u might be using bigger batch size which doesn't fit in your memory |
No I haven't changed any batch sizes, by default it should be 1 for inference too, right ? And I'm using Azure Nvidia Tesla M60 GPU with 8Gib of memory. |
I think model is around 250 million parameters I doubt 8gb can handle this along with the data. Please try with 16gb ram. |
But I was able to train the model on same gpu without any issues. I'm facing this problem only when i try to do inference on the trained model / last checkpoint. |
Please post the link to inference code you are running. 512,10,50,512 this tensor size seems to be wrong. Might be the problem with way you are passing the data. |
This is the inference code that I'm running. `from flask import Flask,request,render_template app =Flask(name) import sys if not 'texar_repo' in sys.path: from config import * start_tokens = tf.fill([tx.utils.get_batch_size(src_input_ids)], tokenizer = tokenization.FullTokenizer( sess = tf.Session()
@app.route("/results",methods=["GET","POST"]) if name=="main": loss_label_confidence = 0.9 random_seed = 1234 opt = { #warmup steps must be 0.1% of number of iterations bos_token_id =101 model_dir= "./models" max_train_steps = 100000 display_steps = 1 max_decoding_length = 400 max_seq_length_src = 512 epochs =10 is_distributed = False data_dir = r"data/" train_out_file = r"data/train.tf_record" bert_pretrain_dir=r"./bert_uncased_model" train_story = r"data/train_story.txt" eval_story = r"data/eval_story.txt" bert_pretrain_dir = r"../uncased_L-12_H-768_A-12" ` |
Can you change this in infer single example |
Hello, I met the same problem. How did you solve it?@Tanmay06 |
Hi, actually I was away and was working on a different project. @Simons2017 I think you should try @santhoshkolloju 's reply just before your comment. I think it should work. |
@santhoshkolloju, hi, How do I change this? |
@yuyanzhoufang change this line in the original code. Abstractive-Summarization-With-Transfer-Learning/Inference.py Lines 56 to 57 in 97ff2ae
|
hi, in class "CNNDailymail", "if set_type == "test" and i == 0:continue",Why filter out "test == 0" ?
|
I've partially trained the model, but when I went for testing the model and ran Inference.py, with static story and summaries in the script, it gave me the insufficient memory error from tensorflow.
tensorflow.python.framework.errors_impl.ResourceExhaustedError: OOM when allocating tensor with shape[512,10,50,512] and type float on /job:localhost/replica:0/task:0/device:GPU:0 by allocator GPU_0_bfc
The text was updated successfully, but these errors were encountered: