You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I can correctly run the context-free grammar example, but when I use my custumed context free grammar of a domain specific language, it stucked after call generator. I think my grammar is correct because it can pass the lark compilation.
Steps/code to reproduce the bug:
importoutlinessnl_grammar=r'''start: if_elseif_else: "如果" conditions "那么" conditionsconditions: condition (("并且"|"或者") condition)*condition: name "的" property property_cmp (("并且"|"或者")property_cmp)*property_cmp: num_cmp|str_cmpnum_cmp: (num_comp_op number) | (num_comp_op property_val_expr)|(num_comp_op simple_expr)simple_expr: (name "的" property)property_val_expr: (number num_cal_op simple_expr)|(simple_expr num_cal_op number)|(simple_expr num_cal_op simple_expr)str_cmp: str_comp_op ESCAPED_STRINGnum_comp_op: ">"|"<"|">="|"<="num_cal_op: "+"|"-"|"*"|"/"str_comp_op: "包含"|"不包含"|"匹配"|"不匹配"property: namename: WORDnumber: SIGNED_NUMBERLCASE_LETTER: "a".."z"UCASE_LETTER: "A".."Z"CN_ZH_LETTER: /[u"\u4e00-\u9fa5"]/LETTER: UCASE_LETTER | LCASE_LETTER | CN_ZH_LETTERWORD: LETTER+%import common.SIGNED_NUMBER%import common.WS%import common.ESCAPED_STRING%ignore WS'''prompt_test='''The following the a context free grammar for a domain specific language:start: if_elseif_else: "如果" conditions "那么" conditionsconditions: condition (("并且"|"或者") condition)*condition: name "的" property property_cmp (("并且"|"或者")property_cmp)*property_cmp: num_cmp|str_cmpnum_cmp: (num_comp_op number) | (num_comp_op property_val_expr)|(num_comp_op simple_expr)simple_expr: (name "的" property)property_val_expr: (number num_cal_op simple_expr)|(simple_expr num_cal_op number)|(simple_expr num_cal_op simple_expr)str_cmp: str_comp_op ESCAPED_STRINGnum_comp_op: ">"|"<"|">="|"<="num_cal_op: "+"|"-"|"*"|"/"str_comp_op: "包含"|"不包含"|"匹配"|"不匹配"property: namename: WORDnumber: SIGNED_NUMBERLCASE_LETTER: "a".."z"UCASE_LETTER: "A".."Z"CN_ZH_LETTER: /[u"\u4e00-\u9fa5"]/LETTER: UCASE_LETTER | LCASE_LETTER | CN_ZH_LETTERWORD: LETTER+%import common.SIGNED_NUMBER%import common.WS%import common.ESCAPED_STRING%ignore WSPlease convert the following text to domain specific languageText:4.2.6 管廊的柱距应满足大多数管道的跨距要求,宜为6m~9m。Output:'''importtimestart=time.time()
model=outlines.models.transformers("/home/yd/llm_weights/Qwen2.5-7B-Instruct")
generator=outlines.generate.cfg(model, snl_grammar)
sequence=generator(prompt_test)
print(sequence)
total=time.time() -startprint(total)
Expected result:
It should output a valid sentence based on my cfg
Error message:
No error message, it get stuck after printing:
Loading checkpoint shards: 100%|███████████████████████████████████████████████████████████████████████████████████████████████| 4/4 [00:02<00:00, 1.41it/s]
/home/user/.conda/envs/hc_general/lib/python3.12/site-packages/outlines/fsm/guide.py:110: UserWarning: Outlines' public *community-contributed* CFG structured generation is experimental. Please review https://dottxt-ai.github.io/outlines/latest/reference/generation/cfg#disclaimer warnings.warn(
Outlines/Python version information:
Version information
```
0.1.7
Python 3.12.1
```
Context for the issue:
Currently I'm trying to automatically convert text to a domain specific language, the text and DSL are both in Chinese, I want to use cfg constrained decoding to improve generation accuracy, but both use outlines and vllm doesn't seem to work
The text was updated successfully, but these errors were encountered:
It's hard to debug this without more information about the result of your DSL. In general with infinite generation like this, it's likely a small issue with the CFG. It may be syntactically valid, but may not be semantically valid.
To help debug, I'd try limiting token generation and inspecting it to see if it's what you expect:
Thanks! I can get the result after restricting the length to smaller number like 10. But the generation is pretty slow, for qwen-7b and a nvidia 4090, it takes 57s to generate 10 characters. I think the cfg isn't very complicated though, It defined a simple language similar to a if-else statement of python with syntax replaced using Chinese words. Does the speed normal?
Describe the issue as clearly as possible:
I can correctly run the context-free grammar example, but when I use my custumed context free grammar of a domain specific language, it stucked after call generator. I think my grammar is correct because it can pass the lark compilation.
Steps/code to reproduce the bug:
Expected result:
Error message:
Outlines/Python version information:
Version information
Context for the issue:
Currently I'm trying to automatically convert text to a domain specific language, the text and DSL are both in Chinese, I want to use cfg constrained decoding to improve generation accuracy, but both use outlines and vllm doesn't seem to work
The text was updated successfully, but these errors were encountered: