Skip to content

Commit

Permalink
add tokenzier viewer
Browse files Browse the repository at this point in the history
  • Loading branch information
HarderThenHarder committed Jun 11, 2023
1 parent 45e793a commit 764cf0c
Show file tree
Hide file tree
Showing 5 changed files with 395 additions and 1 deletion.
13 changes: 12 additions & 1 deletion readme.md
Original file line number Diff line number Diff line change
Expand Up @@ -107,4 +107,15 @@
| 模型 | 传送门 |
|---|---|
| ChatGLM-6B | [[这里]](./LLM/finetune/readme.md) |
| ChatGLM-6B | [[这里]](./LLM/finetune/readme.md) |


<br>

#### 9. 工具类(Tools)

> 一些常用工具集合。
| 工具名 | 传送门 |
|---|---|
| Tokenizer Viewer | [[这里]](./tools/tokenizer_viewer/readme.md) |
Binary file added tools/tokenizer_viewer/assets/preview.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
23 changes: 23 additions & 0 deletions tools/tokenizer_viewer/readme.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,23 @@
# Tokenizer Viewer

Tokenizer Viwer 是一款方便快速预览 tokenizer 的工具。

其功能包括:

- [ x ] 查看词表内容,字符数统计。
- [ x ] 搜索 tokenizer 中是否包含指定 token。
- [ x ] 编码(encode)/ 解码(decode)测试。
- [ x ] 比较 2 个 tokenizer 之间的 token 差异。
- [ ] 将 2 个不同的 tokenizer 做 merge。

使用 `strat.sh` 启动平台,可在 `--server.port` 调整启用端口。

```sh
streamlit run web_ui.py --server.port 8001
```

<div align='center'>

<img src='assets/preview.png'>

</div>
1 change: 1 addition & 0 deletions tools/tokenizer_viewer/start.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
streamlit run web_ui.py --server.port 8001
Loading

0 comments on commit 764cf0c

Please sign in to comment.