Merge pull request #13 from sean1832/Major_Dev

Major dev 1.0.0
sean1832 · Mar 3, 2023 · 39e0ee9 · 39e0ee9
2 parents 67fba77 + 75fbee3
commit 39e0ee9
Show file tree

Hide file tree

Showing 15 changed files with 320 additions and 193 deletions.
diff --git a/.core/manifest.json b/.core/manifest.json
@@ -1,6 +1,6 @@
 {
   "name": "GPT-Brain",
-  "version": "0.1.1",
+  "version": "1.0.0",
   "license": "MIT",
   "author": "Zeke Zhang",
   "homepage": "https://github.com/sean1832/GPT-Brain",

diff --git a/Documentation/README_CN.md b/Documentation/README_CN.md
@@ -9,8 +9,15 @@
 
 *💡本人并非专业程序猿，并且是一个python小白，此项目可能会出现各种bug。如果你遇到bug，请在[问题栏](https://github.com/sean1832/GPT-Brain/issues)里提出，我会尽可能的进行修补。*
 
+### 简介
+本程序利用[GPT-3](https://platform.openai.com/docs/models/gpt-3)和[3.5](https://platform.openai.com/docs/models/gpt-3-5)的能力，提供对原子笔记内容的概括，以及针对笔记的特定内容的回答。
+该程序扫描指定目录（通常是包含多个笔记的vault），并将所有笔记的内容附加到单个文件中。
+该文件随后用作用户查询的上下文。程序能够识别笔记内容之间的关系，并生成一个精炼的回答，概括关键要点。
+
+尽管该程序与使用markdown或txt的其他笔记软件兼容，但它主要是针对[Obsidian](https://obsidian.md/)设计的。
+
 ### 功能
-- [x] 使用 [OpenAI GPT-3](https://platform.openai.com/docs/models/gpt-3) 生成回答。
+- [x] 使用 [OpenAI GPT-3](https://platform.openai.com/docs/models/gpt-3)和 [GPT-3.5 (ChatGPT)](https://platform.openai.com/docs/models/gpt-3-5) 生成回答。
 - [x] 使用 [OpenAI embedding](https://platform.openai.com/docs/guides/embeddings/what-are-embeddings) 对笔记内容和问题进行对称比较，以增强搜索效果。
 - [x] 可配置prompt。
 - [x] 可个性化的个人背景信息以获得更贴切的答案。
@@ -23,13 +30,9 @@
 - [x] 基本和高级参数滑块，以便于调整OpenAI语言模型配置。
 
 ### 未来计划
-- [x] ~~batch脚本更新库。~~
-- [x] ~~版本控制。~~
-- [x] ~~参数提示。~~
-- [x] ~~支持多语言UI~~。
-- [x] ~~支持多语言检索。~~
-- [ ] 提供详细操作指南。
-- [ ] 发布windows版本。
+- [ ] 支持PDF笔记格式。
+- [ ] 支持PDF OCR扫描。
+- [ ] 支持Word文档格式。
 
 ## 安装
 ### 1. 所需条件

diff --git a/Documentation/README_JP.md b/Documentation/README_JP.md
@@ -9,8 +9,19 @@
 
 *💡私はプロのプログラマーではなく、Pythonにもかなり慣れていないため、このプロジェクトにはバグが含まれているかもしれません。もし何か問題があれば、[Issues section](https://github.com/sean1832/GPT-Brain/issues)で提案してください。*
 
+### 紹介
+このプログラムは、[GPT-3](https://platform.openai.com/docs/models/gpt-3)と[3.5](https://platform.openai.com/docs/models/gpt-3-5)の力を活用して、原子的なノートの内容の要約と、
+特定のノートに関連する質問に回答することを提供します。
+プログラムは、通常、複数のノートを含むvaultとして指定されたディレクトリをスキャンし、
+すべてのノートの内容を単一のファイルに追加します。
+このファイルは、ユーザーのクエリの文脈として機能します。プログラムは、ノートの内容の関係を識別し、
+主要なポイントを要約する洗練された応答を生成できます。
+
+このプログラムは、markdownまたはtxtを使用する他のノート取りソフトウェアでも互換性がありますが、
+主に[Obsidian](https://obsidian.md/)を想定して設計されています。
+
 ### フィーチャー
-- [x] [OpenAI GPT-3](https://platform.openai.com/docs/models/gpt-3)を使って、レスポンスを生成します。
+- [x] [OpenAI GPT-3](https://platform.openai.com/docs/models/gpt-3)と[GPT-3.5 (ChatGPT)](https://platform.openai.com/docs/models/gpt-3-5)を使って、レスポンスを生成します。
 - [x] [OpenAIエンベッディング](https://platform.openai.com/docs/guides/embeddings/what-are-embeddings)を使用して、質問とノートの内容を意味的に比較し、検索を強化します。
 - [x] 設定可能なプロンプト。
 - [x] より正確な回答を得るために、個人の背景情報をカスタマイズすることができます。
@@ -23,13 +34,9 @@
 - [x] OpenAI言語モデルの構成に対する基本的および高度なパラメータースライダー。。
 
 ### Todo
-- [x] ~~ライブラリの更新を行うバッチスクリプト。~~
-- [x]　~~バージョニング。~~
-- [x] ~~パラメータに関するヒント。~~
-- [x] ~~多言語UI。~~
-- [x] ~~多言語検索に対応。~~
-- [ ] ユーザー向けの詳細なドキュメントを提供する。
-- [ ] Windows用をリリース。
+- [ ] PDFサポート。
+- [ ] PDF OCRスキャンをサポート。
+- [ ] Word文書をサポート。
 
 ## 設置
 ### 1. 必要なもの

diff --git a/GPT/__init__.py b/GPT/__init__.py
@@ -1,3 +1,3 @@
 from GPT import query
-from GPT import toolkit
+from GPT import gpt_tools
 from GPT import model
diff --git a/GPT/gpt_tools.py b/GPT/gpt_tools.py
@@ -0,0 +1,94 @@
+import openai
+import numpy as np
+import requests
+import sseclient
+
+
+# this function compare similarity between two vectors.
+# The higher value the dot product have, the more alike between these vectors
+def similarity(v1, v2):
+    return np.dot(v1, v2)
+
+
+# return a list of vectors
+def embedding(content, engine='text-embedding-ada-002'):
+    response = openai.Embedding.create(input=content, engine=engine)
+    vector = response['data'][0]['embedding']
+    return vector
+
+
+def search_chunks(query, data, count=1):
+    vector = embedding(query)
+    points = []
+
+    for item in data:
+        # compare search terms with brain-data
+        point = similarity(vector, item['vector'])
+        points.append({
+            'content': item['content'],
+            'point': point
+        })
+    # sort points base on descendant order
+    ordered = sorted(points, key=lambda d: d['point'], reverse=True)
+
+    return ordered[0:count]
+
+
+def gpt3(prompt, model, params):
+    response = openai.Completion.create(
+        model=model,
+        prompt=prompt,
+        temperature=params.temp,
+        max_tokens=params.max_tokens,
+        top_p=params.top_p,
+        frequency_penalty=params.frequency_penalty,
+        presence_penalty=params.present_penalty
+    )
+    text = response['choices'][0]['text'].strip()
+    return text
+
+
+def gpt35(prompt, params, system_role_content: str = 'You are a helpful assistant.'):
+    completions = openai.ChatCompletion.create(
+        model="gpt-3.5-turbo",
+        max_tokens=params.max_tokens,
+        temperature=params.temp,
+        top_p=params.top_p,
+        frequency_penalty=params.frequency_penalty,
+        presence_penalty=params.present_penalty,
+        messages=[
+            {"role": "system", "content": system_role_content},
+            {"role": "user", "content": prompt}
+        ])
+    text = completions['choices'][0]['message']['content']
+    return text
+
+
+def gpt3_stream(prompt, model, params):
+    response = openai.Completion.create(
+        model=model,
+        stream=True,
+        prompt=prompt,
+        temperature=params.temp,
+        max_tokens=params.max_tokens,
+        top_p=params.top_p,
+        frequency_penalty=params.frequency_penalty,
+        presence_penalty=params.present_penalty
+    )
+    return response
+
+
+def gpt35_stream(prompt, params, system_role_content: str = 'You are a helpful assistant.'):
+    completions = openai.ChatCompletion.create(
+        model="gpt-3.5-turbo",
+        max_tokens=params.max_tokens,
+        temperature=params.temp,
+        top_p=params.top_p,
+        frequency_penalty=params.frequency_penalty,
+        presence_penalty=params.present_penalty,
+        stream=True,
+        messages=[
+            {"role": "system", "content": system_role_content},
+            {"role": "user", "content": prompt}
+        ])
+    return completions
diff --git a/GPT/query.py b/GPT/query.py
@@ -11,20 +11,20 @@
 
 openai.api_key = API_KEY
 
-
 SESSION_LANG = st.session_state['SESSION_LANGUAGE']
 _ = language.set_language()
 
 
 def build(chunk_size=4000):
+    openai.api_key = API_KEY
     all_text = util.read_file(r'.user\input.txt')
 
     # split text into smaller chunk of 4000 char each
     chunks = textwrap.wrap(all_text, chunk_size)
     chunk_count = len(chunks)
     result = []
     for idx, chunk in enumerate(chunks):
-        embedding = GPT.toolkit.embedding(chunk.encode(encoding='ASCII', errors='ignore').decode())
+        embedding = GPT.gpt_tools.embedding(chunk.encode(encoding='ASCII', errors='ignore').decode())
         info = {'content': chunk, 'vector': embedding}
         print(info, '\n\n\n')
 
@@ -38,7 +38,7 @@ def build(chunk_size=4000):
 def run(query, model, prompt_file, isQuestion, params, info_file=None):
     if isQuestion:
         data = util.read_json(INFO.BRAIN_DATA)
-        results = GPT.toolkit.search_chunks(query, data, params.chunk_count)
+        results = GPT.gpt_tools.search_chunks(query, data, params.chunk_count)
         answers = []
         for result in results:
             my_info = util.read_file(info_file)
@@ -47,35 +47,49 @@ def run(query, model, prompt_file, isQuestion, params, info_file=None):
             prompt = prompt.replace('<<QS>>', query)
             prompt = prompt.replace('<<MY-INFO>>', my_info)
 
-            answer = GPT.toolkit.gpt3(prompt, model, params)
+            if model == 'gpt-3.5-turbo':
+                answer = GPT.gpt_tools.gpt35(prompt, params)
+            else:
+                answer = GPT.gpt_tools.gpt3(prompt, model, params)
             answers.append(answer)
         all_response = '\n\n'.join(answers)
     else:
         chunks = textwrap.wrap(query, 10000)
         responses = []
         for chunk in chunks:
             prompt = util.read_file(prompt_file).replace('<<DATA>>', chunk)
-            response = GPT.toolkit.gpt3(prompt, model, params)
+            if model == 'gpt-3.5-turbo':
+                response = GPT.gpt_tools.gpt35(prompt, params)
+            else:
+                response = GPT.gpt_tools.gpt3(prompt, model, params)
             responses.append(response)
         all_response = '\n\n'.join(responses)
     return all_response
 
 
-def run_stream(query, model, prompt_file, isQuestion, params, info_file=None):
-    client = None
+def get_stream_prompt(query, prompt_file, isQuestion, info_file=None):
+    openai.api_key = API_KEY
     if isQuestion:
         data = util.read_json(INFO.BRAIN_DATA)
-        results = GPT.toolkit.search_chunks(query, data, count=1)
-        for result in results:
+        if data:
+            result = GPT.gpt_tools.search_chunks(query, data, count=1)
             my_info = util.read_file(info_file)
             prompt = util.read_file(prompt_file)
-            prompt = prompt.replace('<<INFO>>', result['content'])
+            prompt = prompt.replace('<<INFO>>', result[0]['content'])
             prompt = prompt.replace('<<QS>>', query)
             prompt = prompt.replace('<<MY-INFO>>', my_info)
-            client = GPT.toolkit.gpt3_stream(API_KEY, prompt, model, params)
-
+        else:
+            prompt = ''
     else:
         chunk = textwrap.wrap(query, 10000)[0]
         prompt = util.read_file(prompt_file).replace('<<DATA>>', chunk)
-        client = GPT.toolkit.gpt3_stream(API_KEY, prompt, model, params)
+    return prompt
+
+
+def run_stream(query, model, prompt_file, isQuestion, params, info_file=None):
+    prompt = get_stream_prompt(query, prompt_file, isQuestion, info_file)
+    if model == 'gpt-3.5-turbo':
+        client = GPT.gpt_tools.gpt35_stream(prompt, params)
+    else:
+        client = GPT.gpt_tools.gpt3_stream(prompt, model, params)
     return client
diff --git a/GPT/toolkit.py b/GPT/toolkit.py
diff --git a/README.md b/README.md
@@ -9,8 +9,23 @@
 
 *💡As I am not a professional programmer and am fairly new to Python, this project may contain bugs. If you encounter any issues, please suggest them in the [Issues section](https://github.com/sean1832/GPT-Brain/issues).*
 
+### Description
+This program leverages the power of [GPT-3](https://platform.openai.com/docs/models/gpt-3) & [3.5](https://platform.openai.com/docs/models/gpt-3-5) to provide a summary of the content of atomic notes, 
+as well as answer questions related specifically to your notes. 
+The program scans a designated directory, 
+which is typically a vault containing multiple notes, 
+and appends the contents of all the notes to a single file. 
+This file then serves as the context for the user's query. 
+The program is able to identify
+relationships between the contents of the notes, 
+and generate a refined response that summarizes the key points.
+
+Although the program is compatible with other note-taking software that uses
+markdown or txt, 
+it is primarily designed with [Obsidian](https://obsidian.md/) in mind. 
+
 ### Feature
-- [x] Use [OpenAI GPT-3](https://platform.openai.com/docs/models/gpt-3) to generate response.
+- [x] Use [OpenAI GPT-3](https://platform.openai.com/docs/models/gpt-3) and [GPT-3.5 (ChatGPT)](https://platform.openai.com/docs/models/gpt-3-5) to generate response.
 - [x] Use [OpenAI embedding](https://platform.openai.com/docs/guides/embeddings/what-are-embeddings) for semetic comparison of question and note content for enhanced searching.
 - [x] Configurable prompts.
 - [x] Customizable personal background information for more accurate answers.
@@ -23,13 +38,10 @@
 - [x] Basic & Advanced parameter sliders for OpenAI Language model configurations.
 
 ### Todo
-- [x] ~~Batch script to update library.~~
-- [x] ~~Versioning.~~
-- [x] ~~Tooltips for parameters.~~
-- [x] ~~Multilingual support for UI.~~
-- [x] ~~Multilingual search support.~~
-- [ ] Provide detail documentation for users.
-- [ ] Release for windows.
+- [ ] Support PDF format。
+- [ ] Support PDF OCR scan。
+- [ ] Support Word document。
+
 
 ## Install
 ### 1. What you need