Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

训练格式问题请教 #137

Open
UestcJay opened this issue Dec 17, 2024 · 3 comments
Open

训练格式问题请教 #137

UestcJay opened this issue Dec 17, 2024 · 3 comments

Comments

@UestcJay
Copy link

你好,如果我想微调这个模型,请问相应的文本格式是否是我理解的:13个文本token和26个音频token交错在一起
system_prompt:
User will provide you with a speech instruction. Do it step by step. First, think about the instruction and respond in a interleaved manner, with 13 text token followed by 26 audio tokens.
{"from":"human","value":"<|begin_of_audio|><|audio_10815|><|audio_7234|><|audio_4461|><|audio_6509|><|audio_803|><|audio_15646|><|audio_10396|><|audio_7838|><|audio_9356|><|audio_16378|><|audio_6774|><|audio_13666|><|audio_13036|><|audio_11542|><|audio_11896|><|audio_3933|><|audio_953|><|audio_13938|><|audio_5488|><|audio_1216|><|audio_11047|><|audio_10794|><|audio_307|><|audio_663|><|audio_11543|><|audio_5505|><|audio_4850|><|audio_6403|><|audio_15024|><|audio_14639|><|audio_547|><|audio_628|><|audio_6404|><|audio_13433|><|audio_9788|><|audio_5681|><|audio_3039|><|audio_13319|><|audio_12072|><|audio_12072|><|end_of_audio|>"}

{"from":"gpt","value":"1. 制定合理的工作计划,按照计划有条不紊地<|audio_710|><|audio_9273|><|audio_9694|><|audio_12535|><|audio_6023|><|audio_9326|><|audio_9005|><|audio_6774|><|audio_5255|><|audio_5968|><|audio_15088|><|audio_10458|><|audio_7859|><|audio_12959|><|audio_16128|><|audio_12328|><|audio_13882|><|audio_3591|><|audio_3063|><|audio_2954|><|audio_387|><|audio_12365|><|audio_7273|><|audio_8727|><|audio_13639|><|audio_320|>执行。2. 注意时间管理,学会分配好工作时间和<|audio_11468|><|audio_10940|><|audio_3254|><|audio_16073|><|audio_1869|><|audio_13327|><|audio_14595|><|audio_8113|><|audio_14429|><|audio_4574|><|audio_9165|><|audio_12895|><|audio_1176|><|audio_7939|><|audio_6178|><|audio_7488|><|audio_11479|><|audio_2625|><|audio_9714|><|audio_4662|><|audio_3857|><|audio_8694|><|audio_15510|><|audio_5870|><|audio_13679|><|audio_762|>休息时间。3. 避免不必要的打扰和干扰<|audio_1349|><|audio_12213|><|audio_1176|><|audio_1993|><|audio_4781|><|audio_11598|><|audio_1340|><|audio_12363|><|audio_10972|><|audio_4673|><|audio_13664|><|audio_7939|><|audio_9968|><|audio_15413|><|audio_2990|><|audio_3781|><|audio_14408|><|audio_3400|><|audio_7369|><|audio_4803|><|audio_13124|><|audio_3237|><|audio_4122|><|audio_11543|><|audio_8629|><|audio_15118|>,保持专注。4. 掌握高效的工作方法和工具<|audio_1449|><|audio_12249|><|audio_8417|><|audio_7859|><|audio_12959|><|audio_15706|><|audio_10086|><|audio_8995|><|audio_10492|><|audio_8609|><|audio_2868|><|audio_280|><|audio_3566|><|audio_2585|><|audio_12182|><|audio_2868|><|audio_11806|><|audio_8848|><|audio_6586|><|audio_10492|><|audio_6745|><|audio_11681|><|audio_10600|><|audio_5438|><|audio_15470|><|audio_6376|>,提高工作效率。<|audio_14770|><|audio_4772|><|audio_15152|><|audio_1278|><|audio_15483|><|audio_5661|><|audio_11254|><|audio_9066|><|audio_2796|><|audio_674|><|audio_12981|><|audio_8305|><|audio_3089|><|audio_12148|><|audio_5540|><|audio_10973|><|audio_12376|><|audio_15934|><|audio_1762|><|audio_3550|><|audio_9005|><|audio_4504|><|audio_4826|><|audio_253|><|audio_6543|><|audio_9829|><|audio_1237|><|audio_10270|><|audio_15526|><|audio_13889|><|audio_14539|><|audio_13257|><|audio_12008|><|audio_8609|><|audio_10483|><|audio_10965|><|audio_5343|><|audio_9741|><|audio_8381|><|audio_3632|><|audio_4981|><|audio_5559|><|audio_1501|><|audio_5939|><|audio_13556|><|audio_15817|><|audio_4082|><|audio_13044|><|audio_5874|><|audio_2129|><|audio_5890|><|audio_13622|><|audio_317|><|audio_15179|><|audio_458|><|audio_2831|><|audio_3412|><|audio_2845|><|audio_6572|><|audio_1733|><|audio_14515|><|audio_3586|><|audio_5751|><|audio_13116|><|audio_2441|><|audio_4721|><|audio_2585|><|audio_6704|><|audio_6168|><|audio_12894|><|audio_15475|><|audio_11535|><|audio_10957|><|audio_7038|><|audio_5006|><|audio_7736|><|audio_4826|><|audio_9100|><|audio_4158|><|audio_6336|><|audio_12894|><|audio_10495|><|audio_11538|><|audio_10646|><|audio_11157|><|audio_12587|><|audio_14901|><|audio_492|><|audio_5554|><|audio_13390|><|audio_7038|><|audio_3039|><|audio_12823|>"},

"source_conversations":[{"from":"human","value":"请问如何才能在工作中保持高效率?"},{"from":"gpt","value":"1. 制定合理的工作计划,按照计划有条不紊地执行。2. 注意时间管理,学会分配好工作时间和休息时间。3. 避免不必要的打扰和干扰,保持专注。4. 掌握高效的工作方法和工具,提高工作效率。"}]

@sixsixcoder
Copy link

文本格式是13个文本token和26个音频token交错,你可以尝试这样微调

@Ch1r3
Copy link

Ch1r3 commented Jan 6, 2025

请问提问者这个问题有答案了嘛?另外,你的13个文本token+26个音频token的数据是如何得到的呢?希望得到回复!

@mymoongit
Copy link

发现13个文本token 与26个音频token 并不完全对应,经常后面文本token 为0, 音频token还是26
这个有办法调整么,

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants