Merge pull request #6166 from oobabooga/dev

Merge dev branch
oobabooga · Jun 27, 2024 · 6915c50 · 6915c50
2 parents 4820ae9 + 8ec8bc0
commit 6915c50
Show file tree

Hide file tree

Showing 42 changed files with 524 additions and 245 deletions.
diff --git a/.github/workflows/stale.yml b/.github/workflows/stale.yml
@@ -13,8 +13,8 @@ jobs:
       - uses: actions/stale@v5
         with:
           stale-issue-message: ""
-          close-issue-message: "This issue has been closed due to inactivity for 2 months. If you believe it is still relevant, please leave a comment below. You can tag a developer in your comment."
-          days-before-issue-stale: 60
+          close-issue-message: "This issue has been closed due to inactivity for 6 months. If you believe it is still relevant, please leave a comment below. You can tag a developer in your comment."
+          days-before-issue-stale: 180
           days-before-issue-close: 0
           stale-issue-label: "stale"
           days-before-pr-stale: -1

diff --git a/README.md b/README.md
@@ -11,7 +11,7 @@ Its goal is to become the [AUTOMATIC1111/stable-diffusion-webui](https://github.
 ## Features
 
 * 3 interface modes: default (two columns), notebook, and chat.
-* Multiple model backends: [Transformers](https://github.com/huggingface/transformers), [llama.cpp](https://github.com/ggerganov/llama.cpp) (through [llama-cpp-python](https://github.com/abetlen/llama-cpp-python)), [ExLlamaV2](https://github.com/turboderp/exllamav2), [AutoGPTQ](https://github.com/PanQiWei/AutoGPTQ), [AutoAWQ](https://github.com/casper-hansen/AutoAWQ).
+* Multiple model backends: [Transformers](https://github.com/huggingface/transformers), [llama.cpp](https://github.com/ggerganov/llama.cpp) (through [llama-cpp-python](https://github.com/abetlen/llama-cpp-python)), [ExLlamaV2](https://github.com/turboderp/exllamav2), [AutoGPTQ](https://github.com/PanQiWei/AutoGPTQ), [AutoAWQ](https://github.com/casper-hansen/AutoAWQ), [TensorRT-LLM](https://github.com/NVIDIA/TensorRT-LLM).
 * Dropdown menu for quickly switching between different models.
 * Large number of extensions (built-in and user-contributed), including Coqui TTS for realistic voice outputs, Whisper STT for voice inputs, translation, [multimodal pipelines](https://github.com/oobabooga/text-generation-webui/tree/main/extensions/multimodal), vector databases, Stable Diffusion integration, and a lot more. See [the wiki](https://github.com/oobabooga/text-generation-webui/wiki/07-%E2%80%90-Extensions) and [the extensions directory](https://github.com/oobabooga/text-generation-webui-extensions) for details.
 * [Chat with custom characters](https://github.com/oobabooga/text-generation-webui/wiki/03-%E2%80%90-Parameters-Tab#character).
@@ -76,12 +76,12 @@ conda activate textgen
 
 | System | GPU | Command |
 |--------|---------|---------|
-| Linux/WSL | NVIDIA | `pip3 install torch==2.2.1 torchvision==0.17.1 torchaudio==2.2.1 --index-url https://download.pytorch.org/whl/cu121` |
-| Linux/WSL | CPU only | `pip3 install torch==2.2.1 torchvision==0.17.1 torchaudio==2.2.1 --index-url https://download.pytorch.org/whl/cpu` |
-| Linux | AMD | `pip3 install torch==2.2.1 torchvision==0.17.1 torchaudio==2.2.1 --index-url https://download.pytorch.org/whl/rocm5.6` |
-| MacOS + MPS | Any | `pip3 install torch==2.2.1 torchvision==0.17.1 torchaudio==2.2.1` |
-| Windows | NVIDIA | `pip3 install torch==2.2.1 torchvision==0.17.1 torchaudio==2.2.1 --index-url https://download.pytorch.org/whl/cu121` |
-| Windows | CPU only | `pip3 install torch==2.2.1 torchvision==0.17.1 torchaudio==2.2.1` |
+| Linux/WSL | NVIDIA | `pip3 install torch==2.2.2 torchvision==0.17.2 torchaudio==2.2.2 --index-url https://download.pytorch.org/whl/cu121` |
+| Linux/WSL | CPU only | `pip3 install torch==2.2.2 torchvision==0.17.2 torchaudio==2.2.2 --index-url https://download.pytorch.org/whl/cpu` |
+| Linux | AMD | `pip3 install torch==2.2.2 torchvision==0.17.2 torchaudio==2.2.2 --index-url https://download.pytorch.org/whl/rocm5.6` |
+| MacOS + MPS | Any | `pip3 install torch==2.2.2 torchvision==0.17.2 torchaudio==2.2.2` |
+| Windows | NVIDIA | `pip3 install torch==2.2.2 torchvision==0.17.2 torchaudio==2.2.2 --index-url https://download.pytorch.org/whl/cu121` |
+| Windows | CPU only | `pip3 install torch==2.2.2 torchvision==0.17.2 torchaudio==2.2.2` |
 
 The up-to-date commands can be found here: https://pytorch.org/get-started/locally/.
 
@@ -146,7 +146,7 @@ Then browse to
 1) For Kepler GPUs and older, you will need to install CUDA 11.8 instead of 12:
 
 ```
-pip3 install torch==2.2.1 torchvision==0.17.1 torchaudio==2.2.1 --index-url https://download.pytorch.org/whl/cu118
+pip3 install torch==2.2.2 torchvision==0.17.2 torchaudio==2.2.2 --index-url https://download.pytorch.org/whl/cu118
 conda install -y -c "nvidia/label/cuda-11.8.0" cuda-runtime
 ```
 

diff --git a/css/html_instruct_style.css b/css/html_instruct_style.css
@@ -62,8 +62,8 @@
 
 .gradio-container .chat .user-message {
     padding: 20px;
-    padding-left: 0px;
-    padding-right: 0px;
+    padding-left: 0;
+    padding-right: 0;
     background-color: transparent;
     border-radius: 8px;
     border-bottom-right-radius: 0;

diff --git a/css/main.css b/css/main.css
@@ -96,7 +96,7 @@ gradio-app > :first-child {
 
 .header_bar {
     background-color: #f7f7f7;
-    box-shadow: 0 0px 3px rgba(22 22 22 / 35%);
+    box-shadow: 0 0 3px rgba(22 22 22 / 35%);
     margin-bottom: 0;
     overflow-x: scroll;
     margin-left: calc(-1 * var(--size-4));
@@ -221,6 +221,7 @@ button {
 
 .pretty_scrollbar::-webkit-scrollbar {
     width: 7px;
+    height: 7px;
 }
 
 .pretty_scrollbar::-webkit-scrollbar-track {
@@ -245,6 +246,10 @@ button {
     background: #374151;
 }
 
+.pretty_scrollbar::-webkit-scrollbar-corner {
+    background: transparent;
+}
+
 audio {
     max-width: 100%;
 }
@@ -433,12 +438,12 @@ div.svelte-362y77>*, div.svelte-362y77>.form>* {
 .message-body code {
     white-space: pre-wrap !important;
     word-wrap: break-word !important;
-    border: 1px solid #666666;
+    border: 1px solid #666;
     border-radius: 5px;
     font-size: 82%;
     padding: 1px 3px;
     background: #0d1117 !important;
-    color: rgb(201, 209, 217);
+    color: rgb(201 209 217);
 }
 
 .message-body pre > code {
@@ -695,7 +700,7 @@ div.svelte-362y77>*, div.svelte-362y77>.form>* {
 @media screen and (width >= 1327px) {
     #past-chats-row {
         position: absolute;
-        top: 16px;
+        top: 36px;
         left: 0;
         width: calc(0.5*(var(--document-width) - 880px - 120px - 16px*2));
         max-width: 300px;
@@ -743,3 +748,47 @@ div.svelte-362y77>*, div.svelte-362y77>.form>* {
         display: none;
     }
 }
+
+#past-chats {
+    max-height: calc(100vh - 195px);
+    overflow-y: scroll !important;
+    border-radius: 0;
+    scrollbar-width: none; /* Hide scrollbar in Firefox by default */
+}
+
+#past-chats label {
+    width: 100%;
+    background-color: transparent !important;
+    background: none;
+    border: 0;
+    border-radius: 0;
+    padding-top: 8px;
+    padding-bottom: 8px;
+}
+
+#past-chats > :nth-child(2) {
+    display: none;
+}
+
+#past-chats > :nth-child(3) {
+    gap: 0;
+}
+
+#past-chats::-webkit-scrollbar {
+    display: none;
+}
+
+#past-chats:hover {
+    scrollbar-width: auto;
+}
+
+#past-chats:hover::-webkit-scrollbar {
+    display: block;
+}
+
+@media screen and (width < 1327px) {
+    #past-chats {
+        max-height: 300px;
+    }
+}
+
diff --git a/docker/TensorRT-LLM/Dockerfile b/docker/TensorRT-LLM/Dockerfile
@@ -0,0 +1,27 @@
+FROM pytorch/pytorch:2.2.1-cuda12.1-cudnn8-runtime
+
+# Install Git
+RUN apt update && apt install -y git
+
+# System-wide TensorRT-LLM requirements
+RUN apt install -y openmpi-bin libopenmpi-dev
+
+# Set the working directory
+WORKDIR /app
+
+# Install text-generation-webui
+RUN git clone https://github.com/oobabooga/text-generation-webui
+WORKDIR /app/text-generation-webui
+RUN pip install -r requirements.txt
+
+# This is needed to avoid an error about "Failed to build mpi4py" in the next command
+ENV LD_LIBRARY_PATH=/usr/lib/x86_64-linux-gnu:$LD_LIBRARY_PATH
+
+# Install TensorRT-LLM
+RUN pip3 install tensorrt_llm==0.10.0 -U --pre --extra-index-url https://pypi.nvidia.com
+
+# Expose the necessary port for the Python server
+EXPOSE 7860 5000
+
+# Run the Python server.py script with the specified command
+CMD ["python", "server.py", "--api", "--listen"]
diff --git a/docs/02 - Default and Notebook Tabs.md b/docs/02 - Default and Notebook Tabs.md
@@ -18,13 +18,13 @@ In the **Prompt** menu, you can select from some predefined prompts defined unde
 
 ### Output
 
-Four tabs can be found:
+Five tabs can be found:
 
 * **Raw**: where the raw text generated by the model appears.
 * **Markdown**: it contains a "Render" button. You can click on it at any time to render the current output as markdown. This is particularly useful for models that generate LaTeX equations like GALACTICA.
 * **HTML**: displays the output in an HTML style that is meant to be easier to read. Its style is defined under `text-generation-webui/css/html_readable_style.css`.
 * **Logits**: when you click on "Get next token probabilities", this tab displays the 50 most likely next tokens and their probabilities based on your current input. If "Use samplers" is checked, the probabilities will be the ones after the sampling parameters in the "Parameters" > "Generation" tab are applied. Otherwise, they will be the raw probabilities generated by the model.
-* **Tokens**: allows you to tokenize your prompt and see the ID numbers for the individuals tokens.
+* **Tokens**: allows you to tokenize your prompt and see the ID numbers for the individual tokens.
 
 ## Notebook tab
 

diff --git a/docs/12 - OpenAI API.md b/docs/12 - OpenAI API.md
@@ -219,7 +219,7 @@ print()
 
 ### Environment variables
 
-The following environment variables can be used (they take precendence over everything else):
+The following environment variables can be used (they take precedence over everything else):
 
 | Variable Name          | Description                                                                                        | Example Value              |
 |------------------------|------------------------------------|----------------------------|

diff --git a/docs/README.md b/docs/README.md
@@ -1,4 +1,4 @@
-These files is a mirror of the documentation at:
+These files are a mirror of the documentation at:
 
 # https://github.com/oobabooga/text-generation-webui/wiki
 

diff --git a/js/main.js b/js/main.js
@@ -98,20 +98,6 @@ document.addEventListener("keydown", function(event) {
     document.getElementById("Impersonate").click();
   }
 
-  // Switch between tabs on Tab
-  else if (!event.ctrlKey && !event.shiftKey && !event.altKey && !event.metaKey && event.key === "Tab") {
-    event.preventDefault();
-    var parametersButton = document.getElementById("parameters-button");
-    var parentContainer = parametersButton.parentNode;
-    var selectedChild = parentContainer.querySelector(".selected");
-
-    if (selectedChild.id == "parameters-button") {
-      document.getElementById(previousTabId).click();
-    } else {
-      previousTabId = selectedChild.id;
-      parametersButton.click();
-    }
-  }
 });
 
 //------------------------------------------------
@@ -548,3 +534,8 @@ document.querySelectorAll(".focus-on-chat-input").forEach(element => {
     document.querySelector("#chat-input textarea").focus();
   });
 });
+
+//------------------------------------------------
+// Fix a border around the "past chats" menu
+//------------------------------------------------
+document.getElementById("past-chats").parentNode.style.borderRadius = "0px";
diff --git a/modules/LoRA.py b/modules/LoRA.py
@@ -73,7 +73,7 @@ def add_lora_autogptq(lora_names):
         if len(lora_names) > 1:
             logger.warning('AutoGPTQ can only work with 1 LoRA at the moment. Only the first one in the list will be loaded.')
         if not shared.args.no_inject_fused_attention:
-            logger.warning('Fused Atttention + AutoGPTQ may break Lora loading. Disable it.')
+            logger.warning('Fused Attention + AutoGPTQ may break Lora loading. Disable it.')
 
         peft_config = GPTQLoraConfig(
             inference_mode=True,

diff --git a/modules/RoPE.py b/modules/RoPE.py
diff --git a/modules/block_requests.py b/modules/block_requests.py
@@ -43,19 +43,27 @@ def my_open(*args, **kwargs):
         with original_open(*args, **kwargs) as f:
             file_contents = f.read()
 
-        file_contents = file_contents.replace(b'\t\t<script\n\t\t\tsrc="https://cdnjs.cloudflare.com/ajax/libs/iframe-resizer/4.3.9/iframeResizer.contentWindow.min.js"\n\t\t\tasync\n\t\t></script>', b'')
-        file_contents = file_contents.replace(b'cdnjs.cloudflare.com', b'127.0.0.1')
+        if len(args) > 1 and args[1] == 'rb':
+            file_contents = file_contents.decode('utf-8')
+
+        file_contents = file_contents.replace('\t\t<script\n\t\t\tsrc="https://cdnjs.cloudflare.com/ajax/libs/iframe-resizer/4.3.9/iframeResizer.contentWindow.min.js"\n\t\t\tasync\n\t\t></script>', '')
+        file_contents = file_contents.replace('cdnjs.cloudflare.com', '127.0.0.1')
         file_contents = file_contents.replace(
-            b'</head>',
-            b'\n    <script src="file/js/katex/katex.min.js"></script>'
-            b'\n    <script src="file/js/katex/auto-render.min.js"></script>'
-            b'\n    <script src="file/js/highlightjs/highlight.min.js"></script>'
-            b'\n    <script src="file/js/highlightjs/highlightjs-copy.min.js"></script>'
-            b'\n    <script>hljs.addPlugin(new CopyButtonPlugin());</script>'
-            b'\n  </head>'
+            '</head>',
+            '\n    <script src="file/js/katex/katex.min.js"></script>'
+            '\n    <script src="file/js/katex/auto-render.min.js"></script>'
+            '\n    <script src="file/js/highlightjs/highlight.min.js"></script>'
+            '\n    <script src="file/js/highlightjs/highlightjs-copy.min.js"></script>'
+            '\n    <script>hljs.addPlugin(new CopyButtonPlugin());</script>'
+            '\n  </head>'
         )
 
-        return io.BytesIO(file_contents)
+        if len(args) > 1 and args[1] == 'rb':
+            file_contents = file_contents.encode('utf-8')
+            return io.BytesIO(file_contents)
+        else:
+            return io.StringIO(file_contents)
+
     else:
         return original_open(*args, **kwargs)