Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CUDA error if enabling compile_prefill for quantization model (int8) #137

Open
yanboliang opened this issue Mar 14, 2024 · 8 comments
Open

Comments

@yanboliang
Copy link
Contributor

yanboliang commented Mar 14, 2024

Repro command:

python generate.py --compile --compile_prefill --checkpoint_path checkpoints/$MODEL_REPO/model_int8.pth

Errors:

(pt) [[email protected] ~/local/gpt-fast (main)]$ python generate.py --compile --compile_prefill --checkpoint_path checkpoints/$MODEL_REPO/model_int8.pth
/home/ybliang/local/miniconda3/envs/pt/lib/python3.10/site-packages/transformers/utils/generic.py:441: UserWarning: torch.utils._pytree._register_pytree_node is deprecated. Please use torch.utils._pytree.register_pytree_node instead.
  _torch_pytree._register_pytree_node(
Using device=cuda
Loading model ...
Using int8 weight-only quantization!
Time to load model: 6.15 seconds
/home/ybliang/local/pytorch/torch/backends/cuda/__init__.py:342: FutureWarning: torch.backends.cuda.sdp_kernel() is deprecated. In the future, this context manager will be removed. Please see, torch.nn.attention.sdpa_kernel() for the new context manager, with updated signature.
  warnings.warn(
unknown:0: unknown: block: [0,0,0], thread: [128,0,0] Assertion `index out of bounds: 0 <= tmp4 < 32000` failed.
unknown:0: unknown: block: [0,0,0], thread: [129,0,0] Assertion `index out of bounds: 0 <= tmp4 < 32000` failed.
unknown:0: unknown: block: [0,0,0], thread: [130,0,0] Assertion `index out of bounds: 0 <= tmp4 < 32000` failed.
unknown:0: unknown: block: [0,0,0], thread: [131,0,0] Assertion `index out of bounds: 0 <= tmp4 < 32000` failed.
unknown:0: unknown: block: [0,0,0], thread: [132,0,0] Assertion `index out of bounds: 0 <= tmp4 < 32000` failed.
unknown:0: unknown: block: [0,0,0], thread: [133,0,0] Assertion `index out of bounds: 0 <= tmp4 < 32000` failed.
unknown:0: unknown: block: [0,0,0], thread: [134,0,0] Assertion `index out of bounds: 0 <= tmp4 < 32000` failed.
unknown:0: unknown: block: [0,0,0], thread: [135,0,0] Assertion `index out of bounds: 0 <= tmp4 < 32000` failed.
unknown:0: unknown: block: [0,0,0], thread: [136,0,0] Assertion `index out of bounds: 0 <= tmp4 < 32000` failed.
unknown:0: unknown: block: [0,0,0], thread: [137,0,0] Assertion `index out of bounds: 0 <= tmp4 < 32000` failed.
unknown:0: unknown: block: [0,0,0], thread: [138,0,0] Assertion `index out of bounds: 0 <= tmp4 < 32000` failed.
unknown:0: unknown: block: [0,0,0], thread: [139,0,0] Assertion `index out of bounds: 0 <= tmp4 < 32000` failed.
unknown:0: unknown: block: [0,0,0], thread: [140,0,0] Assertion `index out of bounds: 0 <= tmp4 < 32000` failed.
unknown:0: unknown: block: [0,0,0], thread: [141,0,0] Assertion `index out of bounds: 0 <= tmp4 < 32000` failed.
unknown:0: unknown: block: [0,0,0], thread: [142,0,0] Assertion `index out of bounds: 0 <= tmp4 < 32000` failed.
unknown:0: unknown: block: [0,0,0], thread: [143,0,0] Assertion `index out of bounds: 0 <= tmp4 < 32000` failed.
unknown:0: unknown: block: [0,0,0], thread: [144,0,0] Assertion `index out of bounds: 0 <= tmp4 < 32000` failed.
unknown:0: unknown: block: [0,0,0], thread: [145,0,0] Assertion `index out of bounds: 0 <= tmp4 < 32000` failed.
unknown:0: unknown: block: [0,0,0], thread: [146,0,0] Assertion `index out of bounds: 0 <= tmp4 < 32000` failed.
unknown:0: unknown: block: [0,0,0], thread: [147,0,0] Assertion `index out of bounds: 0 <= tmp4 < 32000` failed.
unknown:0: unknown: block: [0,0,0], thread: [148,0,0] Assertion `index out of bounds: 0 <= tmp4 < 32000` failed.
unknown:0: unknown: block: [0,0,0], thread: [149,0,0] Assertion `index out of bounds: 0 <= tmp4 < 32000` failed.
unknown:0: unknown: block: [0,0,0], thread: [150,0,0] Assertion `index out of bounds: 0 <= tmp4 < 32000` failed.
unknown:0: unknown: block: [0,0,0], thread: [151,0,0] Assertion `index out of bounds: 0 <= tmp4 < 32000` failed.
unknown:0: unknown: block: [0,0,0], thread: [152,0,0] Assertion `index out of bounds: 0 <= tmp4 < 32000` failed.
unknown:0: unknown: block: [0,0,0], thread: [153,0,0] Assertion `index out of bounds: 0 <= tmp4 < 32000` failed.
unknown:0: unknown: block: [0,0,0], thread: [154,0,0] Assertion `index out of bounds: 0 <= tmp4 < 32000` failed.
unknown:0: unknown: block: [0,0,0], thread: [155,0,0] Assertion `index out of bounds: 0 <= tmp4 < 32000` failed.
unknown:0: unknown: block: [0,0,0], thread: [156,0,0] Assertion `index out of bounds: 0 <= tmp4 < 32000` failed.
unknown:0: unknown: block: [0,0,0], thread: [157,0,0] Assertion `index out of bounds: 0 <= tmp4 < 32000` failed.
unknown:0: unknown: block: [0,0,0], thread: [158,0,0] Assertion `index out of bounds: 0 <= tmp4 < 32000` failed.
unknown:0: unknown: block: [0,0,0], thread: [159,0,0] Assertion `index out of bounds: 0 <= tmp4 < 32000` failed.
unknown:0: unknown: block: [0,0,0], thread: [192,0,0] Assertion `index out of bounds: 0 <= tmp4 < 32000` failed.
unknown:0: unknown: block: [0,0,0], thread: [193,0,0] Assertion `index out of bounds: 0 <= tmp4 < 32000` failed.
unknown:0: unknown: block: [0,0,0], thread: [194,0,0] Assertion `index out of bounds: 0 <= tmp4 < 32000` failed.
unknown:0: unknown: block: [0,0,0], thread: [195,0,0] Assertion `index out of bounds: 0 <= tmp4 < 32000` failed.
unknown:0: unknown: block: [0,0,0], thread: [196,0,0] Assertion `index out of bounds: 0 <= tmp4 < 32000` failed.
unknown:0: unknown: block: [0,0,0], thread: [197,0,0] Assertion `index out of bounds: 0 <= tmp4 < 32000` failed.
unknown:0: unknown: block: [0,0,0], thread: [198,0,0] Assertion `index out of bounds: 0 <= tmp4 < 32000` failed.
unknown:0: unknown: block: [0,0,0], thread: [199,0,0] Assertion `index out of bounds: 0 <= tmp4 < 32000` failed.
unknown:0: unknown: block: [0,0,0], thread: [200,0,0] Assertion `index out of bounds: 0 <= tmp4 < 32000` failed.
unknown:0: unknown: block: [0,0,0], thread: [201,0,0] Assertion `index out of bounds: 0 <= tmp4 < 32000` failed.
unknown:0: unknown: block: [0,0,0], thread: [202,0,0] Assertion `index out of bounds: 0 <= tmp4 < 32000` failed.
unknown:0: unknown: block: [0,0,0], thread: [203,0,0] Assertion `index out of bounds: 0 <= tmp4 < 32000` failed.
unknown:0: unknown: block: [0,0,0], thread: [204,0,0] Assertion `index out of bounds: 0 <= tmp4 < 32000` failed.
unknown:0: unknown: block: [0,0,0], thread: [205,0,0] Assertion `index out of bounds: 0 <= tmp4 < 32000` failed.
unknown:0: unknown: block: [0,0,0], thread: [206,0,0] Assertion `index out of bounds: 0 <= tmp4 < 32000` failed.
unknown:0: unknown: block: [0,0,0], thread: [207,0,0] Assertion `index out of bounds: 0 <= tmp4 < 32000` failed.
unknown:0: unknown: block: [0,0,0], thread: [208,0,0] Assertion `index out of bounds: 0 <= tmp4 < 32000` failed.
unknown:0: unknown: block: [0,0,0], thread: [209,0,0] Assertion `index out of bounds: 0 <= tmp4 < 32000` failed.
unknown:0: unknown: block: [0,0,0], thread: [210,0,0] Assertion `index out of bounds: 0 <= tmp4 < 32000` failed.
unknown:0: unknown: block: [0,0,0], thread: [211,0,0] Assertion `index out of bounds: 0 <= tmp4 < 32000` failed.
unknown:0: unknown: block: [0,0,0], thread: [212,0,0] Assertion `index out of bounds: 0 <= tmp4 < 32000` failed.
unknown:0: unknown: block: [0,0,0], thread: [213,0,0] Assertion `index out of bounds: 0 <= tmp4 < 32000` failed.
unknown:0: unknown: block: [0,0,0], thread: [214,0,0] Assertion `index out of bounds: 0 <= tmp4 < 32000` failed.
unknown:0: unknown: block: [0,0,0], thread: [215,0,0] Assertion `index out of bounds: 0 <= tmp4 < 32000` failed.
unknown:0: unknown: block: [0,0,0], thread: [216,0,0] Assertion `index out of bounds: 0 <= tmp4 < 32000` failed.
unknown:0: unknown: block: [0,0,0], thread: [217,0,0] Assertion `index out of bounds: 0 <= tmp4 < 32000` failed.
unknown:0: unknown: block: [0,0,0], thread: [218,0,0] Assertion `index out of bounds: 0 <= tmp4 < 32000` failed.
unknown:0: unknown: block: [0,0,0], thread: [219,0,0] Assertion `index out of bounds: 0 <= tmp4 < 32000` failed.
unknown:0: unknown: block: [0,0,0], thread: [220,0,0] Assertion `index out of bounds: 0 <= tmp4 < 32000` failed.
unknown:0: unknown: block: [0,0,0], thread: [221,0,0] Assertion `index out of bounds: 0 <= tmp4 < 32000` failed.
unknown:0: unknown: block: [0,0,0], thread: [222,0,0] Assertion `index out of bounds: 0 <= tmp4 < 32000` failed.
unknown:0: unknown: block: [0,0,0], thread: [223,0,0] Assertion `index out of bounds: 0 <= tmp4 < 32000` failed.
unknown:0: unknown: block: [0,0,0], thread: [160,0,0] Assertion `index out of bounds: 0 <= tmp4 < 32000` failed.
unknown:0: unknown: block: [0,0,0], thread: [161,0,0] Assertion `index out of bounds: 0 <= tmp4 < 32000` failed.
unknown:0: unknown: block: [0,0,0], thread: [162,0,0] Assertion `index out of bounds: 0 <= tmp4 < 32000` failed.
unknown:0: unknown: block: [0,0,0], thread: [163,0,0] Assertion `index out of bounds: 0 <= tmp4 < 32000` failed.
unknown:0: unknown: block: [0,0,0], thread: [164,0,0] Assertion `index out of bounds: 0 <= tmp4 < 32000` failed.
unknown:0: unknown: block: [0,0,0], thread: [165,0,0] Assertion `index out of bounds: 0 <= tmp4 < 32000` failed.
unknown:0: unknown: block: [0,0,0], thread: [166,0,0] Assertion `index out of bounds: 0 <= tmp4 < 32000` failed.
unknown:0: unknown: block: [0,0,0], thread: [167,0,0] Assertion `index out of bounds: 0 <= tmp4 < 32000` failed.
unknown:0: unknown: block: [0,0,0], thread: [168,0,0] Assertion `index out of bounds: 0 <= tmp4 < 32000` failed.
unknown:0: unknown: block: [0,0,0], thread: [169,0,0] Assertion `index out of bounds: 0 <= tmp4 < 32000` failed.
unknown:0: unknown: block: [0,0,0], thread: [170,0,0] Assertion `index out of bounds: 0 <= tmp4 < 32000` failed.
unknown:0: unknown: block: [0,0,0], thread: [171,0,0] Assertion `index out of bounds: 0 <= tmp4 < 32000` failed.
unknown:0: unknown: block: [0,0,0], thread: [172,0,0] Assertion `index out of bounds: 0 <= tmp4 < 32000` failed.
unknown:0: unknown: block: [0,0,0], thread: [173,0,0] Assertion `index out of bounds: 0 <= tmp4 < 32000` failed.
unknown:0: unknown: block: [0,0,0], thread: [174,0,0] Assertion `index out of bounds: 0 <= tmp4 < 32000` failed.
unknown:0: unknown: block: [0,0,0], thread: [175,0,0] Assertion `index out of bounds: 0 <= tmp4 < 32000` failed.
unknown:0: unknown: block: [0,0,0], thread: [176,0,0] Assertion `index out of bounds: 0 <= tmp4 < 32000` failed.
unknown:0: unknown: block: [0,0,0], thread: [177,0,0] Assertion `index out of bounds: 0 <= tmp4 < 32000` failed.
unknown:0: unknown: block: [0,0,0], thread: [178,0,0] Assertion `index out of bounds: 0 <= tmp4 < 32000` failed.
unknown:0: unknown: block: [0,0,0], thread: [179,0,0] Assertion `index out of bounds: 0 <= tmp4 < 32000` failed.
unknown:0: unknown: block: [0,0,0], thread: [180,0,0] Assertion `index out of bounds: 0 <= tmp4 < 32000` failed.
unknown:0: unknown: block: [0,0,0], thread: [181,0,0] Assertion `index out of bounds: 0 <= tmp4 < 32000` failed.
unknown:0: unknown: block: [0,0,0], thread: [182,0,0] Assertion `index out of bounds: 0 <= tmp4 < 32000` failed.
unknown:0: unknown: block: [0,0,0], thread: [183,0,0] Assertion `index out of bounds: 0 <= tmp4 < 32000` failed.
unknown:0: unknown: block: [0,0,0], thread: [184,0,0] Assertion `index out of bounds: 0 <= tmp4 < 32000` failed.
unknown:0: unknown: block: [0,0,0], thread: [185,0,0] Assertion `index out of bounds: 0 <= tmp4 < 32000` failed.
unknown:0: unknown: block: [0,0,0], thread: [186,0,0] Assertion `index out of bounds: 0 <= tmp4 < 32000` failed.
unknown:0: unknown: block: [0,0,0], thread: [187,0,0] Assertion `index out of bounds: 0 <= tmp4 < 32000` failed.
unknown:0: unknown: block: [0,0,0], thread: [188,0,0] Assertion `index out of bounds: 0 <= tmp4 < 32000` failed.
unknown:0: unknown: block: [0,0,0], thread: [189,0,0] Assertion `index out of bounds: 0 <= tmp4 < 32000` failed.
unknown:0: unknown: block: [0,0,0], thread: [190,0,0] Assertion `index out of bounds: 0 <= tmp4 < 32000` failed.
unknown:0: unknown: block: [0,0,0], thread: [191,0,0] Assertion `index out of bounds: 0 <= tmp4 < 32000` failed.
unknown:0: unknown: block: [0,0,0], thread: [64,0,0] Assertion `index out of bounds: 0 <= tmp4 < 32000` failed.
unknown:0: unknown: block: [0,0,0], thread: [65,0,0] Assertion `index out of bounds: 0 <= tmp4 < 32000` failed.
unknown:0: unknown: block: [0,0,0], thread: [66,0,0] Assertion `index out of bounds: 0 <= tmp4 < 32000` failed.
unknown:0: unknown: block: [0,0,0], thread: [67,0,0] Assertion `index out of bounds: 0 <= tmp4 < 32000` failed.
unknown:0: unknown: block: [0,0,0], thread: [68,0,0] Assertion `index out of bounds: 0 <= tmp4 < 32000` failed.
unknown:0: unknown: block: [0,0,0], thread: [69,0,0] Assertion `index out of bounds: 0 <= tmp4 < 32000` failed.
unknown:0: unknown: block: [0,0,0], thread: [70,0,0] Assertion `index out of bounds: 0 <= tmp4 < 32000` failed.
unknown:0: unknown: block: [0,0,0], thread: [71,0,0] Assertion `index out of bounds: 0 <= tmp4 < 32000` failed.
unknown:0: unknown: block: [0,0,0], thread: [72,0,0] Assertion `index out of bounds: 0 <= tmp4 < 32000` failed.
unknown:0: unknown: block: [0,0,0], thread: [73,0,0] Assertion `index out of bounds: 0 <= tmp4 < 32000` failed.
unknown:0: unknown: block: [0,0,0], thread: [74,0,0] Assertion `index out of bounds: 0 <= tmp4 < 32000` failed.
unknown:0: unknown: block: [0,0,0], thread: [75,0,0] Assertion `index out of bounds: 0 <= tmp4 < 32000` failed.
unknown:0: unknown: block: [0,0,0], thread: [76,0,0] Assertion `index out of bounds: 0 <= tmp4 < 32000` failed.
unknown:0: unknown: block: [0,0,0], thread: [77,0,0] Assertion `index out of bounds: 0 <= tmp4 < 32000` failed.
unknown:0: unknown: block: [0,0,0], thread: [78,0,0] Assertion `index out of bounds: 0 <= tmp4 < 32000` failed.
unknown:0: unknown: block: [0,0,0], thread: [79,0,0] Assertion `index out of bounds: 0 <= tmp4 < 32000` failed.
unknown:0: unknown: block: [0,0,0], thread: [80,0,0] Assertion `index out of bounds: 0 <= tmp4 < 32000` failed.
unknown:0: unknown: block: [0,0,0], thread: [81,0,0] Assertion `index out of bounds: 0 <= tmp4 < 32000` failed.
unknown:0: unknown: block: [0,0,0], thread: [82,0,0] Assertion `index out of bounds: 0 <= tmp4 < 32000` failed.
unknown:0: unknown: block: [0,0,0], thread: [83,0,0] Assertion `index out of bounds: 0 <= tmp4 < 32000` failed.
unknown:0: unknown: block: [0,0,0], thread: [84,0,0] Assertion `index out of bounds: 0 <= tmp4 < 32000` failed.
unknown:0: unknown: block: [0,0,0], thread: [85,0,0] Assertion `index out of bounds: 0 <= tmp4 < 32000` failed.
unknown:0: unknown: block: [0,0,0], thread: [86,0,0] Assertion `index out of bounds: 0 <= tmp4 < 32000` failed.
unknown:0: unknown: block: [0,0,0], thread: [87,0,0] Assertion `index out of bounds: 0 <= tmp4 < 32000` failed.
unknown:0: unknown: block: [0,0,0], thread: [88,0,0] Assertion `index out of bounds: 0 <= tmp4 < 32000` failed.
unknown:0: unknown: block: [0,0,0], thread: [89,0,0] Assertion `index out of bounds: 0 <= tmp4 < 32000` failed.
unknown:0: unknown: block: [0,0,0], thread: [90,0,0] Assertion `index out of bounds: 0 <= tmp4 < 32000` failed.
unknown:0: unknown: block: [0,0,0], thread: [91,0,0] Assertion `index out of bounds: 0 <= tmp4 < 32000` failed.
unknown:0: unknown: block: [0,0,0], thread: [92,0,0] Assertion `index out of bounds: 0 <= tmp4 < 32000` failed.
unknown:0: unknown: block: [0,0,0], thread: [93,0,0] Assertion `index out of bounds: 0 <= tmp4 < 32000` failed.
unknown:0: unknown: block: [0,0,0], thread: [94,0,0] Assertion `index out of bounds: 0 <= tmp4 < 32000` failed.
unknown:0: unknown: block: [0,0,0], thread: [95,0,0] Assertion `index out of bounds: 0 <= tmp4 < 32000` failed.
unknown:0: unknown: block: [0,0,0], thread: [224,0,0] Assertion `index out of bounds: 0 <= tmp4 < 32000` failed.
unknown:0: unknown: block: [0,0,0], thread: [225,0,0] Assertion `index out of bounds: 0 <= tmp4 < 32000` failed.
unknown:0: unknown: block: [0,0,0], thread: [226,0,0] Assertion `index out of bounds: 0 <= tmp4 < 32000` failed.
unknown:0: unknown: block: [0,0,0], thread: [227,0,0] Assertion `index out of bounds: 0 <= tmp4 < 32000` failed.
unknown:0: unknown: block: [0,0,0], thread: [228,0,0] Assertion `index out of bounds: 0 <= tmp4 < 32000` failed.
unknown:0: unknown: block: [0,0,0], thread: [229,0,0] Assertion `index out of bounds: 0 <= tmp4 < 32000` failed.
unknown:0: unknown: block: [0,0,0], thread: [230,0,0] Assertion `index out of bounds: 0 <= tmp4 < 32000` failed.
unknown:0: unknown: block: [0,0,0], thread: [231,0,0] Assertion `index out of bounds: 0 <= tmp4 < 32000` failed.
unknown:0: unknown: block: [0,0,0], thread: [232,0,0] Assertion `index out of bounds: 0 <= tmp4 < 32000` failed.
unknown:0: unknown: block: [0,0,0], thread: [233,0,0] Assertion `index out of bounds: 0 <= tmp4 < 32000` failed.
unknown:0: unknown: block: [0,0,0], thread: [234,0,0] Assertion `index out of bounds: 0 <= tmp4 < 32000` failed.
unknown:0: unknown: block: [0,0,0], thread: [235,0,0] Assertion `index out of bounds: 0 <= tmp4 < 32000` failed.
unknown:0: unknown: block: [0,0,0], thread: [236,0,0] Assertion `index out of bounds: 0 <= tmp4 < 32000` failed.
unknown:0: unknown: block: [0,0,0], thread: [237,0,0] Assertion `index out of bounds: 0 <= tmp4 < 32000` failed.
unknown:0: unknown: block: [0,0,0], thread: [238,0,0] Assertion `index out of bounds: 0 <= tmp4 < 32000` failed.
unknown:0: unknown: block: [0,0,0], thread: [239,0,0] Assertion `index out of bounds: 0 <= tmp4 < 32000` failed.
unknown:0: unknown: block: [0,0,0], thread: [240,0,0] Assertion `index out of bounds: 0 <= tmp4 < 32000` failed.
unknown:0: unknown: block: [0,0,0], thread: [241,0,0] Assertion `index out of bounds: 0 <= tmp4 < 32000` failed.
unknown:0: unknown: block: [0,0,0], thread: [242,0,0] Assertion `index out of bounds: 0 <= tmp4 < 32000` failed.
unknown:0: unknown: block: [0,0,0], thread: [243,0,0] Assertion `index out of bounds: 0 <= tmp4 < 32000` failed.
unknown:0: unknown: block: [0,0,0], thread: [244,0,0] Assertion `index out of bounds: 0 <= tmp4 < 32000` failed.
unknown:0: unknown: block: [0,0,0], thread: [245,0,0] Assertion `index out of bounds: 0 <= tmp4 < 32000` failed.
unknown:0: unknown: block: [0,0,0], thread: [246,0,0] Assertion `index out of bounds: 0 <= tmp4 < 32000` failed.
unknown:0: unknown: block: [0,0,0], thread: [247,0,0] Assertion `index out of bounds: 0 <= tmp4 < 32000` failed.
unknown:0: unknown: block: [0,0,0], thread: [248,0,0] Assertion `index out of bounds: 0 <= tmp4 < 32000` failed.
unknown:0: unknown: block: [0,0,0], thread: [249,0,0] Assertion `index out of bounds: 0 <= tmp4 < 32000` failed.
unknown:0: unknown: block: [0,0,0], thread: [250,0,0] Assertion `index out of bounds: 0 <= tmp4 < 32000` failed.
unknown:0: unknown: block: [0,0,0], thread: [251,0,0] Assertion `index out of bounds: 0 <= tmp4 < 32000` failed.
unknown:0: unknown: block: [0,0,0], thread: [252,0,0] Assertion `index out of bounds: 0 <= tmp4 < 32000` failed.
unknown:0: unknown: block: [0,0,0], thread: [253,0,0] Assertion `index out of bounds: 0 <= tmp4 < 32000` failed.
unknown:0: unknown: block: [0,0,0], thread: [254,0,0] Assertion `index out of bounds: 0 <= tmp4 < 32000` failed.
unknown:0: unknown: block: [0,0,0], thread: [255,0,0] Assertion `index out of bounds: 0 <= tmp4 < 32000` failed.
unknown:0: unknown: block: [0,0,0], thread: [32,0,0] Assertion `index out of bounds: 0 <= tmp4 < 32000` failed.
unknown:0: unknown: block: [0,0,0], thread: [33,0,0] Assertion `index out of bounds: 0 <= tmp4 < 32000` failed.
unknown:0: unknown: block: [0,0,0], thread: [34,0,0] Assertion `index out of bounds: 0 <= tmp4 < 32000` failed.
unknown:0: unknown: block: [0,0,0], thread: [35,0,0] Assertion `index out of bounds: 0 <= tmp4 < 32000` failed.
unknown:0: unknown: block: [0,0,0], thread: [36,0,0] Assertion `index out of bounds: 0 <= tmp4 < 32000` failed.
unknown:0: unknown: block: [0,0,0], thread: [37,0,0] Assertion `index out of bounds: 0 <= tmp4 < 32000` failed.
unknown:0: unknown: block: [0,0,0], thread: [38,0,0] Assertion `index out of bounds: 0 <= tmp4 < 32000` failed.
unknown:0: unknown: block: [0,0,0], thread: [39,0,0] Assertion `index out of bounds: 0 <= tmp4 < 32000` failed.
unknown:0: unknown: block: [0,0,0], thread: [40,0,0] Assertion `index out of bounds: 0 <= tmp4 < 32000` failed.
unknown:0: unknown: block: [0,0,0], thread: [41,0,0] Assertion `index out of bounds: 0 <= tmp4 < 32000` failed.
unknown:0: unknown: block: [0,0,0], thread: [42,0,0] Assertion `index out of bounds: 0 <= tmp4 < 32000` failed.
unknown:0: unknown: block: [0,0,0], thread: [43,0,0] Assertion `index out of bounds: 0 <= tmp4 < 32000` failed.
unknown:0: unknown: block: [0,0,0], thread: [44,0,0] Assertion `index out of bounds: 0 <= tmp4 < 32000` failed.
unknown:0: unknown: block: [0,0,0], thread: [45,0,0] Assertion `index out of bounds: 0 <= tmp4 < 32000` failed.
unknown:0: unknown: block: [0,0,0], thread: [46,0,0] Assertion `index out of bounds: 0 <= tmp4 < 32000` failed.
unknown:0: unknown: block: [0,0,0], thread: [47,0,0] Assertion `index out of bounds: 0 <= tmp4 < 32000` failed.
unknown:0: unknown: block: [0,0,0], thread: [48,0,0] Assertion `index out of bounds: 0 <= tmp4 < 32000` failed.
unknown:0: unknown: block: [0,0,0], thread: [49,0,0] Assertion `index out of bounds: 0 <= tmp4 < 32000` failed.
unknown:0: unknown: block: [0,0,0], thread: [50,0,0] Assertion `index out of bounds: 0 <= tmp4 < 32000` failed.
unknown:0: unknown: block: [0,0,0], thread: [51,0,0] Assertion `index out of bounds: 0 <= tmp4 < 32000` failed.
unknown:0: unknown: block: [0,0,0], thread: [52,0,0] Assertion `index out of bounds: 0 <= tmp4 < 32000` failed.
unknown:0: unknown: block: [0,0,0], thread: [53,0,0] Assertion `index out of bounds: 0 <= tmp4 < 32000` failed.
unknown:0: unknown: block: [0,0,0], thread: [54,0,0] Assertion `index out of bounds: 0 <= tmp4 < 32000` failed.
unknown:0: unknown: block: [0,0,0], thread: [55,0,0] Assertion `index out of bounds: 0 <= tmp4 < 32000` failed.
unknown:0: unknown: block: [0,0,0], thread: [56,0,0] Assertion `index out of bounds: 0 <= tmp4 < 32000` failed.
unknown:0: unknown: block: [0,0,0], thread: [57,0,0] Assertion `index out of bounds: 0 <= tmp4 < 32000` failed.
unknown:0: unknown: block: [0,0,0], thread: [58,0,0] Assertion `index out of bounds: 0 <= tmp4 < 32000` failed.
unknown:0: unknown: block: [0,0,0], thread: [59,0,0] Assertion `index out of bounds: 0 <= tmp4 < 32000` failed.
unknown:0: unknown: block: [0,0,0], thread: [60,0,0] Assertion `index out of bounds: 0 <= tmp4 < 32000` failed.
unknown:0: unknown: block: [0,0,0], thread: [61,0,0] Assertion `index out of bounds: 0 <= tmp4 < 32000` failed.
unknown:0: unknown: block: [0,0,0], thread: [62,0,0] Assertion `index out of bounds: 0 <= tmp4 < 32000` failed.
unknown:0: unknown: block: [0,0,0], thread: [63,0,0] Assertion `index out of bounds: 0 <= tmp4 < 32000` failed.
unknown:0: unknown: block: [0,0,0], thread: [0,0,0] Assertion `index out of bounds: 0 <= tmp4 < 32000` failed.
unknown:0: unknown: block: [0,0,0], thread: [1,0,0] Assertion `index out of bounds: 0 <= tmp4 < 32000` failed.
unknown:0: unknown: block: [0,0,0], thread: [2,0,0] Assertion `index out of bounds: 0 <= tmp4 < 32000` failed.
unknown:0: unknown: block: [0,0,0], thread: [3,0,0] Assertion `index out of bounds: 0 <= tmp4 < 32000` failed.
unknown:0: unknown: block: [0,0,0], thread: [4,0,0] Assertion `index out of bounds: 0 <= tmp4 < 32000` failed.
unknown:0: unknown: block: [0,0,0], thread: [5,0,0] Assertion `index out of bounds: 0 <= tmp4 < 32000` failed.
unknown:0: unknown: block: [0,0,0], thread: [6,0,0] Assertion `index out of bounds: 0 <= tmp4 < 32000` failed.
unknown:0: unknown: block: [0,0,0], thread: [7,0,0] Assertion `index out of bounds: 0 <= tmp4 < 32000` failed.
unknown:0: unknown: block: [0,0,0], thread: [8,0,0] Assertion `index out of bounds: 0 <= tmp4 < 32000` failed.
unknown:0: unknown: block: [0,0,0], thread: [9,0,0] Assertion `index out of bounds: 0 <= tmp4 < 32000` failed.
unknown:0: unknown: block: [0,0,0], thread: [10,0,0] Assertion `index out of bounds: 0 <= tmp4 < 32000` failed.
unknown:0: unknown: block: [0,0,0], thread: [11,0,0] Assertion `index out of bounds: 0 <= tmp4 < 32000` failed.
unknown:0: unknown: block: [0,0,0], thread: [12,0,0] Assertion `index out of bounds: 0 <= tmp4 < 32000` failed.
unknown:0: unknown: block: [0,0,0], thread: [13,0,0] Assertion `index out of bounds: 0 <= tmp4 < 32000` failed.
unknown:0: unknown: block: [0,0,0], thread: [14,0,0] Assertion `index out of bounds: 0 <= tmp4 < 32000` failed.
unknown:0: unknown: block: [0,0,0], thread: [15,0,0] Assertion `index out of bounds: 0 <= tmp4 < 32000` failed.
unknown:0: unknown: block: [0,0,0], thread: [16,0,0] Assertion `index out of bounds: 0 <= tmp4 < 32000` failed.
unknown:0: unknown: block: [0,0,0], thread: [17,0,0] Assertion `index out of bounds: 0 <= tmp4 < 32000` failed.
unknown:0: unknown: block: [0,0,0], thread: [18,0,0] Assertion `index out of bounds: 0 <= tmp4 < 32000` failed.
unknown:0: unknown: block: [0,0,0], thread: [19,0,0] Assertion `index out of bounds: 0 <= tmp4 < 32000` failed.
unknown:0: unknown: block: [0,0,0], thread: [20,0,0] Assertion `index out of bounds: 0 <= tmp4 < 32000` failed.
unknown:0: unknown: block: [0,0,0], thread: [21,0,0] Assertion `index out of bounds: 0 <= tmp4 < 32000` failed.
unknown:0: unknown: block: [0,0,0], thread: [22,0,0] Assertion `index out of bounds: 0 <= tmp4 < 32000` failed.
unknown:0: unknown: block: [0,0,0], thread: [23,0,0] Assertion `index out of bounds: 0 <= tmp4 < 32000` failed.
unknown:0: unknown: block: [0,0,0], thread: [24,0,0] Assertion `index out of bounds: 0 <= tmp4 < 32000` failed.
unknown:0: unknown: block: [0,0,0], thread: [25,0,0] Assertion `index out of bounds: 0 <= tmp4 < 32000` failed.
unknown:0: unknown: block: [0,0,0], thread: [26,0,0] Assertion `index out of bounds: 0 <= tmp4 < 32000` failed.
unknown:0: unknown: block: [0,0,0], thread: [27,0,0] Assertion `index out of bounds: 0 <= tmp4 < 32000` failed.
unknown:0: unknown: block: [0,0,0], thread: [28,0,0] Assertion `index out of bounds: 0 <= tmp4 < 32000` failed.
unknown:0: unknown: block: [0,0,0], thread: [29,0,0] Assertion `index out of bounds: 0 <= tmp4 < 32000` failed.
unknown:0: unknown: block: [0,0,0], thread: [30,0,0] Assertion `index out of bounds: 0 <= tmp4 < 32000` failed.
unknown:0: unknown: block: [0,0,0], thread: [31,0,0] Assertion `index out of bounds: 0 <= tmp4 < 32000` failed.
unknown:0: unknown: block: [0,0,0], thread: [96,0,0] Assertion `index out of bounds: 0 <= tmp4 < 32000` failed.
unknown:0: unknown: block: [0,0,0], thread: [97,0,0] Assertion `index out of bounds: 0 <= tmp4 < 32000` failed.
unknown:0: unknown: block: [0,0,0], thread: [98,0,0] Assertion `index out of bounds: 0 <= tmp4 < 32000` failed.
unknown:0: unknown: block: [0,0,0], thread: [99,0,0] Assertion `index out of bounds: 0 <= tmp4 < 32000` failed.
unknown:0: unknown: block: [0,0,0], thread: [100,0,0] Assertion `index out of bounds: 0 <= tmp4 < 32000` failed.
unknown:0: unknown: block: [0,0,0], thread: [101,0,0] Assertion `index out of bounds: 0 <= tmp4 < 32000` failed.
unknown:0: unknown: block: [0,0,0], thread: [102,0,0] Assertion `index out of bounds: 0 <= tmp4 < 32000` failed.
unknown:0: unknown: block: [0,0,0], thread: [103,0,0] Assertion `index out of bounds: 0 <= tmp4 < 32000` failed.
unknown:0: unknown: block: [0,0,0], thread: [104,0,0] Assertion `index out of bounds: 0 <= tmp4 < 32000` failed.
unknown:0: unknown: block: [0,0,0], thread: [105,0,0] Assertion `index out of bounds: 0 <= tmp4 < 32000` failed.
unknown:0: unknown: block: [0,0,0], thread: [106,0,0] Assertion `index out of bounds: 0 <= tmp4 < 32000` failed.
unknown:0: unknown: block: [0,0,0], thread: [107,0,0] Assertion `index out of bounds: 0 <= tmp4 < 32000` failed.
unknown:0: unknown: block: [0,0,0], thread: [108,0,0] Assertion `index out of bounds: 0 <= tmp4 < 32000` failed.
unknown:0: unknown: block: [0,0,0], thread: [109,0,0] Assertion `index out of bounds: 0 <= tmp4 < 32000` failed.
unknown:0: unknown: block: [0,0,0], thread: [110,0,0] Assertion `index out of bounds: 0 <= tmp4 < 32000` failed.
unknown:0: unknown: block: [0,0,0], thread: [111,0,0] Assertion `index out of bounds: 0 <= tmp4 < 32000` failed.
unknown:0: unknown: block: [0,0,0], thread: [112,0,0] Assertion `index out of bounds: 0 <= tmp4 < 32000` failed.
unknown:0: unknown: block: [0,0,0], thread: [113,0,0] Assertion `index out of bounds: 0 <= tmp4 < 32000` failed.
unknown:0: unknown: block: [0,0,0], thread: [114,0,0] Assertion `index out of bounds: 0 <= tmp4 < 32000` failed.
unknown:0: unknown: block: [0,0,0], thread: [115,0,0] Assertion `index out of bounds: 0 <= tmp4 < 32000` failed.
unknown:0: unknown: block: [0,0,0], thread: [116,0,0] Assertion `index out of bounds: 0 <= tmp4 < 32000` failed.
unknown:0: unknown: block: [0,0,0], thread: [117,0,0] Assertion `index out of bounds: 0 <= tmp4 < 32000` failed.
unknown:0: unknown: block: [0,0,0], thread: [118,0,0] Assertion `index out of bounds: 0 <= tmp4 < 32000` failed.
unknown:0: unknown: block: [0,0,0], thread: [119,0,0] Assertion `index out of bounds: 0 <= tmp4 < 32000` failed.
unknown:0: unknown: block: [0,0,0], thread: [120,0,0] Assertion `index out of bounds: 0 <= tmp4 < 32000` failed.
unknown:0: unknown: block: [0,0,0], thread: [121,0,0] Assertion `index out of bounds: 0 <= tmp4 < 32000` failed.
unknown:0: unknown: block: [0,0,0], thread: [122,0,0] Assertion `index out of bounds: 0 <= tmp4 < 32000` failed.
unknown:0: unknown: block: [0,0,0], thread: [123,0,0] Assertion `index out of bounds: 0 <= tmp4 < 32000` failed.
unknown:0: unknown: block: [0,0,0], thread: [124,0,0] Assertion `index out of bounds: 0 <= tmp4 < 32000` failed.
unknown:0: unknown: block: [0,0,0], thread: [125,0,0] Assertion `index out of bounds: 0 <= tmp4 < 32000` failed.
unknown:0: unknown: block: [0,0,0], thread: [126,0,0] Assertion `index out of bounds: 0 <= tmp4 < 32000` failed.
unknown:0: unknown: block: [0,0,0], thread: [127,0,0] Assertion `index out of bounds: 0 <= tmp4 < 32000` failed.
Traceback (most recent call last):
  File "/data/users/ybliang/gpt-fast/generate.py", line 421, in <module>
    main(
  File "/data/users/ybliang/gpt-fast/generate.py", line 359, in main
    y, metrics = generate(
  File "/home/ybliang/local/pytorch/torch/utils/_contextlib.py", line 115, in decorate_context
    return func(*args, **kwargs)
  File "/data/users/ybliang/gpt-fast/generate.py", line 202, in generate
    generated_tokens, _ = decode_n_tokens(model, next_token.view(1, -1), input_pos, max_new_tokens - 1, callback=callback, **sampling_kwargs)
  File "/data/users/ybliang/gpt-fast/generate.py", line 74, in decode_n_tokens
    next_token, next_prob = decode_one_token(
  File "/home/ybliang/local/pytorch/torch/_dynamo/eval_frame.py", line 450, in _fn
    return fn(*args, **kwargs)
  File "/data/users/ybliang/gpt-fast/generate.py", line 64, in decode_one_token
    def decode_one_token(model: Transformer, x: torch.Tensor, input_pos: torch.Tensor, **sampling_kwargs) -> Tuple[torch.Tensor, torch.Tensor]:
  File "/home/ybliang/local/pytorch/torch/_dynamo/eval_frame.py", line 450, in _fn
    return fn(*args, **kwargs)
  File "/home/ybliang/local/pytorch/torch/_dynamo/external_utils.py", line 36, in inner
    return fn(*args, **kwargs)
  File "/home/ybliang/local/pytorch/torch/_functorch/aot_autograd.py", line 917, in forward
    return compiled_fn(full_args)
  File "/home/ybliang/local/pytorch/torch/_functorch/_aot_autograd/utils.py", line 89, in g
    return f(*args)
  File "/home/ybliang/local/pytorch/torch/_functorch/_aot_autograd/runtime_wrappers.py", line 106, in runtime_wrapper
    all_outs = call_func_at_runtime_with_args(
  File "/home/ybliang/local/pytorch/torch/_functorch/_aot_autograd/utils.py", line 113, in call_func_at_runtime_with_args
    out = normalize_as_list(f(args))
  File "/home/ybliang/local/pytorch/torch/_functorch/_aot_autograd/jit_compile_runtime_wrappers.py", line 152, in rng_functionalization_wrapper
    return compiled_fw(args)
  File "/home/ybliang/local/pytorch/torch/_inductor/codecache.py", line 906, in __call__
    return self.get_current_callable()(inputs)
  File "/home/ybliang/local/pytorch/torch/_inductor/compile_fx.py", line 838, in run
    return compiled_fn(new_inputs)
  File "/home/ybliang/local/pytorch/torch/_inductor/cudagraph_trees.py", line 383, in deferred_cudagraphify
    fn, out = cudagraphify(model, inputs, new_static_input_idxs, *args, **kwargs)
  File "/home/ybliang/local/pytorch/torch/_inductor/cudagraph_trees.py", line 411, in cudagraphify
    return manager.add_function(
  File "/home/ybliang/local/pytorch/torch/_inductor/cudagraph_trees.py", line 1943, in add_function
    return fn, fn(inputs)
  File "/home/ybliang/local/pytorch/torch/_inductor/cudagraph_trees.py", line 1757, in run
    out = self._run(new_inputs, function_id)
  File "/home/ybliang/local/pytorch/torch/_inductor/cudagraph_trees.py", line 1798, in _run
    return self.run_eager(new_inputs, function_id)
  File "/home/ybliang/local/pytorch/torch/_inductor/cudagraph_trees.py", line 1913, in run_eager
    return node.run(new_inputs)
  File "/home/ybliang/local/pytorch/torch/_inductor/cudagraph_trees.py", line 616, in run
    out = self.wrapped_function.model(new_inputs)
  File "/home/ybliang/local/pytorch/torch/_inductor/codecache.py", line 934, in _run_from_cache
    return compiled_graph.compiled_artifact(inputs)
  File "/tmp/torchinductor_ybliang/mi/cmiek2ltsrliaqercc2b6xcfebjyeel2kxpgdgc65xbyxpekhh5j.py", line 2020, in call
    triton_red_fused_add_bmm_embedding_mm_mul_11.run(buf19, arg75_1, buf20, arg77_1, arg78_1, arg455_1, arg65_1, buf16, arg73_1, arg79_1, buf22, 4096, 11008, grid=grid(4096), stream=stream0)
  File "/home/ybliang/local/pytorch/torch/_inductor/triton_heuristics.py", line 635, in run
    self.autotune_to_one_config(*args, grid=grid, **kwargs)
  File "/home/ybliang/local/pytorch/torch/_inductor/triton_heuristics.py", line 531, in autotune_to_one_config
    timings = self.benchmark_all_configs(*args, **kwargs)
  File "/home/ybliang/local/pytorch/torch/_dynamo/utils.py", line 262, in time_wrapper
    r = func(*args, **kwargs)
  File "/home/ybliang/local/pytorch/torch/_inductor/triton_heuristics.py", line 507, in benchmark_all_configs
    timings = {
  File "/home/ybliang/local/pytorch/torch/_inductor/triton_heuristics.py", line 508, in <dictcomp>
    launcher: self.bench(launcher, *args, **kwargs)
  File "/home/ybliang/local/pytorch/torch/_inductor/triton_heuristics.py", line 479, in bench
    return do_bench(kernel_call, rep=40, fast_flush=True)
  File "/home/ybliang/local/pytorch/torch/_inductor/utils.py", line 170, in do_bench
    return triton_do_bench(*args, **kwargs)[0]
  File "/data/users/ybliang/triton/python/triton/testing.py", line 101, in do_bench
    torch.cuda.synchronize()
  File "/home/ybliang/local/pytorch/torch/cuda/__init__.py", line 792, in synchronize
    return torch._C._cuda_synchronize()
RuntimeError: CUDA error: device-side assert triggered
CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1.
Compile with `TORCH_USE_CUDA_DSA` to enable device-side assertions.

generated kernel file: https://gist.github.com/yanboliang/6f5c1171e63909b995b5372dc7c88ab7

@yanboliang yanboliang changed the title CUDA error if compile decode_one_token with dynamic=True + --compile_prefill for quantization model (int8) CUDA error if enabling compile_prefill for quantization model (int8) Mar 14, 2024
@jerrymannil
Copy link

Seeing similar issues with AMD gpus as well.
With AMD GPUs, we are seeing a memory fault rather that device assertions.
Looks like kernels generated for AMD doesn't have these device asserts.

Memory access fault by GPU node-3 (Agent handle: 0x80e6680) on address 0x7eff45229000. Reason: Unknown.
Aborted (core dumped)

@jerrymannil
Copy link

Observations:

  1. Running with "--compile_prefill" alone without "--compile" can run fine (i.e I had to move prefill compile outside of if compile check
  2. The error happens during the first kernel run for decode
  3. The generated wrapper code can run by itself without this error.

So it seems to me the error is related to some interactions b/w the compiled prefill and decode kernels

@jerrymannil
Copy link

jerrymannil commented Mar 28, 2024

Looks like prefill compile can work, if I change next_token.view(1, -1) to next_token.clone().view(1, -1) here

@griff4692
Copy link

Is there a resolution to this problem for --compile only? I am still getting it

pytorch-triton==3.0.0+45fff310c8
torch==2.4.0.dev20240527+cu121
torchaudio==2.2.0.dev20240528+cu121
torchvision==0.19.0.dev20240528+cu121

@yanboliang
Copy link
Contributor Author

@griff4692 Does #137 (comment) work?

@griff4692
Copy link

@griff4692 Does #137 (comment) work?

Nope unfortunately -- it looks like in current code the next_token is already cloned anyway

#137 (comment)

@yanboliang
Copy link
Contributor Author

yanboliang commented Jun 19, 2024

@griff4692 It seems you hit a different issue other than this one, I tried your command and it works well at gpt-fast. So I suspect it's some change on the context-compression side triggered a cudagraph error. I'm looking at which triggers it now.

@jerrymannil
Copy link

jerrymannil commented Jul 15, 2024

The prefill issue w.r.t assertion (Nvidia) and memory fault (AMD) should be fixed by 2c33914
So we can close this issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants