Skip to content

Commit

Permalink
Merge pull request #13 from sean1832/Major_Dev
Browse files Browse the repository at this point in the history
Major dev 1.0.0
  • Loading branch information
sean1832 authored Mar 3, 2023
2 parents 67fba77 + 75fbee3 commit 39e0ee9
Show file tree
Hide file tree
Showing 15 changed files with 320 additions and 193 deletions.
2 changes: 1 addition & 1 deletion .core/manifest.json
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
{
"name": "GPT-Brain",
"version": "0.1.1",
"version": "1.0.0",
"license": "MIT",
"author": "Zeke Zhang",
"homepage": "https://github.com/sean1832/GPT-Brain",
Expand Down
19 changes: 11 additions & 8 deletions Documentation/README_CN.md
Original file line number Diff line number Diff line change
Expand Up @@ -9,8 +9,15 @@

*💡本人并非专业程序猿,并且是一个python小白,此项目可能会出现各种bug。如果你遇到bug,请在[问题栏](https://github.com/sean1832/GPT-Brain/issues)里提出,我会尽可能的进行修补。*

### 简介
本程序利用[GPT-3](https://platform.openai.com/docs/models/gpt-3)[3.5](https://platform.openai.com/docs/models/gpt-3-5)的能力,提供对原子笔记内容的概括,以及针对笔记的特定内容的回答。
该程序扫描指定目录(通常是包含多个笔记的vault),并将所有笔记的内容附加到单个文件中。
该文件随后用作用户查询的上下文。程序能够识别笔记内容之间的关系,并生成一个精炼的回答,概括关键要点。

尽管该程序与使用markdown或txt的其他笔记软件兼容,但它主要是针对[Obsidian](https://obsidian.md/)设计的。

### 功能
- [x] 使用 [OpenAI GPT-3](https://platform.openai.com/docs/models/gpt-3) 生成回答。
- [x] 使用 [OpenAI GPT-3](https://platform.openai.com/docs/models/gpt-3)[GPT-3.5 (ChatGPT)](https://platform.openai.com/docs/models/gpt-3-5) 生成回答。
- [x] 使用 [OpenAI embedding](https://platform.openai.com/docs/guides/embeddings/what-are-embeddings) 对笔记内容和问题进行对称比较,以增强搜索效果。
- [x] 可配置prompt。
- [x] 可个性化的个人背景信息以获得更贴切的答案。
Expand All @@ -23,13 +30,9 @@
- [x] 基本和高级参数滑块,以便于调整OpenAI语言模型配置。

### 未来计划
- [x] ~~batch脚本更新库。~~
- [x] ~~版本控制。~~
- [x] ~~参数提示。~~
- [x] ~~支持多语言UI~~
- [x] ~~支持多语言检索。~~
- [ ] 提供详细操作指南。
- [ ] 发布windows版本。
- [ ] 支持PDF笔记格式。
- [ ] 支持PDF OCR扫描。
- [ ] 支持Word文档格式。

## 安装
### 1. 所需条件
Expand Down
23 changes: 15 additions & 8 deletions Documentation/README_JP.md
Original file line number Diff line number Diff line change
Expand Up @@ -9,8 +9,19 @@

*💡私はプロのプログラマーではなく、Pythonにもかなり慣れていないため、このプロジェクトにはバグが含まれているかもしれません。もし何か問題があれば、[Issues section](https://github.com/sean1832/GPT-Brain/issues)で提案してください。*

### 紹介
このプログラムは、[GPT-3](https://platform.openai.com/docs/models/gpt-3)[3.5](https://platform.openai.com/docs/models/gpt-3-5)の力を活用して、原子的なノートの内容の要約と、
特定のノートに関連する質問に回答することを提供します。
プログラムは、通常、複数のノートを含むvaultとして指定されたディレクトリをスキャンし、
すべてのノートの内容を単一のファイルに追加します。
このファイルは、ユーザーのクエリの文脈として機能します。プログラムは、ノートの内容の関係を識別し、
主要なポイントを要約する洗練された応答を生成できます。

このプログラムは、markdownまたはtxtを使用する他のノート取りソフトウェアでも互換性がありますが、
主に[Obsidian](https://obsidian.md/)を想定して設計されています。

### フィーチャー
- [x] [OpenAI GPT-3](https://platform.openai.com/docs/models/gpt-3)を使って、レスポンスを生成します。
- [x] [OpenAI GPT-3](https://platform.openai.com/docs/models/gpt-3)[GPT-3.5 (ChatGPT)](https://platform.openai.com/docs/models/gpt-3-5)を使って、レスポンスを生成します。
- [x] [OpenAIエンベッディング](https://platform.openai.com/docs/guides/embeddings/what-are-embeddings)を使用して、質問とノートの内容を意味的に比較し、検索を強化します。
- [x] 設定可能なプロンプト。
- [x] より正確な回答を得るために、個人の背景情報をカスタマイズすることができます。
Expand All @@ -23,13 +34,9 @@
- [x] OpenAI言語モデルの構成に対する基本的および高度なパラメータースライダー。。

### Todo
- [x] ~~ライブラリの更新を行うバッチスクリプト。~~
- [x] ~~バージョニング。~~
- [x] ~~パラメータに関するヒント。~~
- [x] ~~多言語UI。~~
- [x] ~~多言語検索に対応。~~
- [ ] ユーザー向けの詳細なドキュメントを提供する。
- [ ] Windows用をリリース。
- [ ] PDFサポート。
- [ ] PDF OCRスキャンをサポート。
- [ ] Word文書をサポート。

## 設置
### 1. 必要なもの
Expand Down
2 changes: 1 addition & 1 deletion GPT/__init__.py
Original file line number Diff line number Diff line change
@@ -1,3 +1,3 @@
from GPT import query
from GPT import toolkit
from GPT import gpt_tools
from GPT import model
94 changes: 94 additions & 0 deletions GPT/gpt_tools.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,94 @@
import openai
import numpy as np
import requests
import sseclient


# this function compare similarity between two vectors.
# The higher value the dot product have, the more alike between these vectors
def similarity(v1, v2):
return np.dot(v1, v2)


# return a list of vectors
def embedding(content, engine='text-embedding-ada-002'):
response = openai.Embedding.create(input=content, engine=engine)
vector = response['data'][0]['embedding']
return vector


def search_chunks(query, data, count=1):
vector = embedding(query)
points = []

for item in data:
# compare search terms with brain-data
point = similarity(vector, item['vector'])
points.append({
'content': item['content'],
'point': point
})
# sort points base on descendant order
ordered = sorted(points, key=lambda d: d['point'], reverse=True)

return ordered[0:count]


def gpt3(prompt, model, params):
response = openai.Completion.create(
model=model,
prompt=prompt,
temperature=params.temp,
max_tokens=params.max_tokens,
top_p=params.top_p,
frequency_penalty=params.frequency_penalty,
presence_penalty=params.present_penalty
)
text = response['choices'][0]['text'].strip()
return text


def gpt35(prompt, params, system_role_content: str = 'You are a helpful assistant.'):
completions = openai.ChatCompletion.create(
model="gpt-3.5-turbo",
max_tokens=params.max_tokens,
temperature=params.temp,
top_p=params.top_p,
frequency_penalty=params.frequency_penalty,
presence_penalty=params.present_penalty,
messages=[
{"role": "system", "content": system_role_content},
{"role": "user", "content": prompt}
])
text = completions['choices'][0]['message']['content']
return text


def gpt3_stream(prompt, model, params):
response = openai.Completion.create(
model=model,
stream=True,
prompt=prompt,
temperature=params.temp,
max_tokens=params.max_tokens,
top_p=params.top_p,
frequency_penalty=params.frequency_penalty,
presence_penalty=params.present_penalty
)
return response


def gpt35_stream(prompt, params, system_role_content: str = 'You are a helpful assistant.'):
completions = openai.ChatCompletion.create(
model="gpt-3.5-turbo",
max_tokens=params.max_tokens,
temperature=params.temp,
top_p=params.top_p,
frequency_penalty=params.frequency_penalty,
presence_penalty=params.present_penalty,
stream=True,
messages=[
{"role": "system", "content": system_role_content},
{"role": "user", "content": prompt}
])
return completions
40 changes: 27 additions & 13 deletions GPT/query.py
Original file line number Diff line number Diff line change
Expand Up @@ -11,20 +11,20 @@

openai.api_key = API_KEY


SESSION_LANG = st.session_state['SESSION_LANGUAGE']
_ = language.set_language()


def build(chunk_size=4000):
openai.api_key = API_KEY
all_text = util.read_file(r'.user\input.txt')

# split text into smaller chunk of 4000 char each
chunks = textwrap.wrap(all_text, chunk_size)
chunk_count = len(chunks)
result = []
for idx, chunk in enumerate(chunks):
embedding = GPT.toolkit.embedding(chunk.encode(encoding='ASCII', errors='ignore').decode())
embedding = GPT.gpt_tools.embedding(chunk.encode(encoding='ASCII', errors='ignore').decode())
info = {'content': chunk, 'vector': embedding}
print(info, '\n\n\n')

Expand All @@ -38,7 +38,7 @@ def build(chunk_size=4000):
def run(query, model, prompt_file, isQuestion, params, info_file=None):
if isQuestion:
data = util.read_json(INFO.BRAIN_DATA)
results = GPT.toolkit.search_chunks(query, data, params.chunk_count)
results = GPT.gpt_tools.search_chunks(query, data, params.chunk_count)
answers = []
for result in results:
my_info = util.read_file(info_file)
Expand All @@ -47,35 +47,49 @@ def run(query, model, prompt_file, isQuestion, params, info_file=None):
prompt = prompt.replace('<<QS>>', query)
prompt = prompt.replace('<<MY-INFO>>', my_info)

answer = GPT.toolkit.gpt3(prompt, model, params)
if model == 'gpt-3.5-turbo':
answer = GPT.gpt_tools.gpt35(prompt, params)
else:
answer = GPT.gpt_tools.gpt3(prompt, model, params)
answers.append(answer)
all_response = '\n\n'.join(answers)
else:
chunks = textwrap.wrap(query, 10000)
responses = []
for chunk in chunks:
prompt = util.read_file(prompt_file).replace('<<DATA>>', chunk)
response = GPT.toolkit.gpt3(prompt, model, params)
if model == 'gpt-3.5-turbo':
response = GPT.gpt_tools.gpt35(prompt, params)
else:
response = GPT.gpt_tools.gpt3(prompt, model, params)
responses.append(response)
all_response = '\n\n'.join(responses)
return all_response


def run_stream(query, model, prompt_file, isQuestion, params, info_file=None):
client = None
def get_stream_prompt(query, prompt_file, isQuestion, info_file=None):
openai.api_key = API_KEY
if isQuestion:
data = util.read_json(INFO.BRAIN_DATA)
results = GPT.toolkit.search_chunks(query, data, count=1)
for result in results:
if data:
result = GPT.gpt_tools.search_chunks(query, data, count=1)
my_info = util.read_file(info_file)
prompt = util.read_file(prompt_file)
prompt = prompt.replace('<<INFO>>', result['content'])
prompt = prompt.replace('<<INFO>>', result[0]['content'])
prompt = prompt.replace('<<QS>>', query)
prompt = prompt.replace('<<MY-INFO>>', my_info)
client = GPT.toolkit.gpt3_stream(API_KEY, prompt, model, params)

else:
prompt = ''
else:
chunk = textwrap.wrap(query, 10000)[0]
prompt = util.read_file(prompt_file).replace('<<DATA>>', chunk)
client = GPT.toolkit.gpt3_stream(API_KEY, prompt, model, params)
return prompt


def run_stream(query, model, prompt_file, isQuestion, params, info_file=None):
prompt = get_stream_prompt(query, prompt_file, isQuestion, info_file)
if model == 'gpt-3.5-turbo':
client = GPT.gpt_tools.gpt35_stream(prompt, params)
else:
client = GPT.gpt_tools.gpt3_stream(prompt, model, params)
return client
71 changes: 0 additions & 71 deletions GPT/toolkit.py

This file was deleted.

28 changes: 20 additions & 8 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -9,8 +9,23 @@

*💡As I am not a professional programmer and am fairly new to Python, this project may contain bugs. If you encounter any issues, please suggest them in the [Issues section](https://github.com/sean1832/GPT-Brain/issues).*

### Description
This program leverages the power of [GPT-3](https://platform.openai.com/docs/models/gpt-3) & [3.5](https://platform.openai.com/docs/models/gpt-3-5) to provide a summary of the content of atomic notes,
as well as answer questions related specifically to your notes.
The program scans a designated directory,
which is typically a vault containing multiple notes,
and appends the contents of all the notes to a single file.
This file then serves as the context for the user's query.
The program is able to identify
relationships between the contents of the notes,
and generate a refined response that summarizes the key points.

Although the program is compatible with other note-taking software that uses
markdown or txt,
it is primarily designed with [Obsidian](https://obsidian.md/) in mind.

### Feature
- [x] Use [OpenAI GPT-3](https://platform.openai.com/docs/models/gpt-3) to generate response.
- [x] Use [OpenAI GPT-3](https://platform.openai.com/docs/models/gpt-3) and [GPT-3.5 (ChatGPT)](https://platform.openai.com/docs/models/gpt-3-5) to generate response.
- [x] Use [OpenAI embedding](https://platform.openai.com/docs/guides/embeddings/what-are-embeddings) for semetic comparison of question and note content for enhanced searching.
- [x] Configurable prompts.
- [x] Customizable personal background information for more accurate answers.
Expand All @@ -23,13 +38,10 @@
- [x] Basic & Advanced parameter sliders for OpenAI Language model configurations.

### Todo
- [x] ~~Batch script to update library.~~
- [x] ~~Versioning.~~
- [x] ~~Tooltips for parameters.~~
- [x] ~~Multilingual support for UI.~~
- [x] ~~Multilingual search support.~~
- [ ] Provide detail documentation for users.
- [ ] Release for windows.
- [ ] Support PDF format。
- [ ] Support PDF OCR scan。
- [ ] Support Word document。


## Install
### 1. What you need
Expand Down
Loading

0 comments on commit 39e0ee9

Please sign in to comment.