update

modified: index.html modified: raw/fig/overview.png
dukGuo · Mar 14, 2024 · b6c4bda · b6c4bda
1 parent 83e7670
commit b6c4bda
Show file tree

Hide file tree

Showing 2 changed files with 20 additions and 19 deletions.
diff --git a/index.html b/index.html
@@ -46,11 +46,8 @@ <h1 id="">
 
     <h3 id="">
       <center>A work submitted to INTERSPEECH 2024</center>
-      <!-- <center>Dake Guo<sup>1</sup>, Xinfa Zhu<sup>1</sup>, Liumeng Xue<sup>2</sup>, Tao Li<sup>1</sup>, Yuanjun Lv<sup>1</sup>, Yuepeng Jiang<sup>1</sup>, Lei Xie<sup>1</sup></center> 
-      <center><sup>1</sup>Audio, Speech and Language Processing Group (ASLP@NPU), School of Computer Science, Northwestern Polytechnical University, Xi'an, China </center>
-      <center><sup>2</sup>School of Data Science, The Chinese University of Hong Kong, Shenzhen (CUHK-Shenzhen), China </center> -->
     </h3>
-    <!-- <center>Accepted by ASRU 2023</center> -->
+
 
 
     <br><br>
@@ -64,39 +61,43 @@ <h2 id="abstract">1. Abstract<a name="abstract"></a></h2>
     embedding obtained from text. Finally, we introduce the context encoder to two typical TTS models, including VITS-based
     TTS and language model-based TTS. Experimental results demonstrate that our proposed approach can effectively capture
     diverse styles and coherent prosody, and thus improves naturalness and expressiveness in audiobook speech synthesis.</p>
-
+    <br><br>
     <table frame=void rules=none>
       <tr>
-        <center><img src='raw/fig/overview.png' width="50%"></center>
-        <center>Overview of our framework</center>
+        <center><img src='raw/fig/overview.png' width="60%"></center>
+        <center><span><b>Figure 1. Overview of our framework</b></span></center>
       </tr>
-      <tr><br><br></tr>
+      <tr><br></tr>
       <tr>
-        <center><img src='raw/fig/vits.png' width="40%"></center>
-        <center>Text-aware Context-aware VITS </center>
+        <td>
+        <center><img src='raw/fig/vits.png' width="80%"></center>
+        <center><span><b>Figure 2. Text-aware Context-aware VITS (TACA-VITS)</b></span> </center>
+        </td>
+        <td>
+          <center><img src='raw/fig/lm.png' width="70%"></center>
+          <center><span><b>Figure 3. Text-aware Context-aware Language Model (TACA-LM)</b></span></center>
+        </tr>
       </tr>
-      <tr><br><br></tr>
+      <!-- <tr><br></tr>
       <tr></tr>
       <tr>
-        <center><img src='raw/fig/lm.png' width="40%"></center>
-        <center>Text-aware Context-aware LM</center>
-      </tr>
+        
+      </tr> -->
     </table>
     <br><br>
 
 
     <h2>2. Demos <a name="Comparison"></a></h2>
-    <p>Methods</p>
+    <!-- <p>Methods</p>
     <ul>
       <li><b><a href="https://github.com/fishaudio/Bert-VITS2">VITS : </a></b>a VITS2 with multilingual BERT<sup>1</sup></li> 
       <li><b>TACA-VITS : </b>A Text-Aware and Context-Aware VITS</li>
       <li><b>LM : </b>A LM with Hubert token</li>        
-      <li><b>TACA-VITS : </b> A Text-Aware and Context-Aware</li>
+      <li><b>TACA-VITS : </b> A Text-Aware and Context-Aware LM-based TTS</li>
 
 
-    </ul>
-
-    <h3>Speech</h3>
+    </ul> -->
+
     <table>
       <tbody id="tbody_speech">
       </tbody>

diff --git a/raw/fig/overview.png b/raw/fig/overview.png