Skip to content

Commit 96a982a

Browse files
fix: better warmup error
1 parent f9910d1 commit 96a982a

File tree

1 file changed

+1
-1
lines changed

1 file changed

+1
-1
lines changed

server/text_generation_server/models/flash_causal_lm.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -670,7 +670,7 @@ def warmup(self, batch: FlashCausalLMBatch):
670670
self.device,
671671
)
672672
_, batch = self.generate_token(batch)
673-
except Exception as e:
673+
except torch.cuda.OutOfMemoryError as e:
674674
raise RuntimeError(
675675
f"Not enough memory to handle {len(batch.input_ids)} prefill tokens. "
676676
f"You need to decrease `--max-batch-prefill-tokens`"

0 commit comments

Comments
 (0)