Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Feature]cnki 分段翻译 #749

Closed
1 task done
systemoutprintlnhelloworld opened this issue Mar 13, 2024 · 4 comments
Closed
1 task done

[Feature]cnki 分段翻译 #749

systemoutprintlnhelloworld opened this issue Mar 13, 2024 · 4 comments
Assignees
Labels
enhancement New feature or request help wanted Extra attention is needed

Comments

@systemoutprintlnhelloworld
Copy link

systemoutprintlnhelloworld commented Mar 13, 2024

Is there an existing issue for this?

  • I have searched the existing issues

Environment

  • OS: Windwos
  • Zotero Version: 6.0
  • Plugin Version: latest

Describe the feature request

Is your feature request related to a problem? Please describe.
A clear and concise description of what the problem is. Ex. I'm always frustrated when [...]
众所周知,CNKI 最新的句子翻译对非会员进行了字数限制,前面也有issue
#196 提到分段翻译来突破这个限制,但是似乎作者没有注意到或者有其他考量,但是由于cnki的翻译确实质量较高,本人也有较高的需求,所以这里重新开了一个issue讨论这个需求.

Why do you need this feature?
A clear and concise description of why you need this feature.
cnki的翻译确实质量较高,本人也有较高的需求

Describe the solution you'd like

The solution you'd like
A clear and concise description of what you want to happen.
由于并不是所有人都需要这个功能,并且按照句号进行分段也许并不能完美分割意群,所以这个功能最好设置为一个可选项

  1. 确定一个明确的单次翻译上限
  2. 将请求翻译单词数 初步分段句子单词数/单次翻译上限
  3. 从前向后确定每个初步分段中最后一个句号的位置,将这个句号之后的单词移动到下一个初部分段
  4. 若当前初步分段中单词数再次超过单次翻译上限,再次确定分段中倒数第二个句子位置,进行同第三步的移动
  5. 反复执行4,直到当前 初步分段已经低于单次翻译上限规定的单词数,再对下一个初步分段进行检查,不满足则反复进行 3&4 步,注意最后一个初步分段多余的句子请新创建一个分段进行填充,检查后超出'单次翻译上限`则继续添加
  6. 依次提交翻译请求,若超过 RPS(Request Per Second),则按照实际情况设置休眠时间并在用户端弹出提示(Optional)
  7. 组装返回结果

此外,本方案中的句号如果不足,可以按照以下优先级进行分段判断 句号>>逗号>>空格 >>选择最后一个单词

需要注意的是,该方案还有很多不足,如:

  1. 会丢失如 they这类代词所指代的上下文,会破坏翻译的可读性,可能有更好的方案,
  2. 只应用于严谨的文体翻译,若文章比较随意,没有用句号或者逗号表达了句号的意思,可能会翻译的没那么好

Alternatives you've considered
A clear and concise description of any alternative solutions or features you've considered.
设置一个提醒弹窗,超过单次翻译上限则不予翻译,或者在翻译栏无声提醒

Anything else?

No response

@windingwind
Copy link
Owner

由于我精力有限,无法顾及所有细节的功能请求,欢迎PR

@systemoutprintlnhelloworld
Copy link
Author

谢谢回复,非常理解

@windingwind windingwind added the help wanted Extra attention is needed label Mar 13, 2024
@sasaju
Copy link
Contributor

sasaju commented Mar 25, 2024

#762 我尝试实现了,已提交PR,不过仅仅根据句号、问号和叹号分割。现在这个不是一个完美的方案。欢迎建议。同时,我尝试使用 compromise 包实现断句:

const nlp = require("compromise");

const text = "Hello world. This is a test. TypeScript is awesome!";
const doc = nlp(text);
const sentences = doc.sentences().out('array');

但是在Zotero环境下好像无法运行

@windingwind
Copy link
Owner

不需要太复杂,翻译出来即可。本来机器翻译准确度也不是很高

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request help wanted Extra attention is needed
Projects
None yet
Development

No branches or pull requests

3 participants