Skip to content

Commit

Permalink
update
Browse files Browse the repository at this point in the history
	modified:   index.html
	modified:   raw/fig/overview.png
  • Loading branch information
dkguo committed Mar 14, 2024
1 parent 83e7670 commit b6c4bda
Show file tree
Hide file tree
Showing 2 changed files with 20 additions and 19 deletions.
39 changes: 20 additions & 19 deletions index.html
Original file line number Diff line number Diff line change
Expand Up @@ -46,11 +46,8 @@ <h1 id="">

<h3 id="">
<center>A work submitted to INTERSPEECH 2024</center>
<!-- <center>Dake Guo<sup>1</sup>, Xinfa Zhu<sup>1</sup>, Liumeng Xue<sup>2</sup>, Tao Li<sup>1</sup>, Yuanjun Lv<sup>1</sup>, Yuepeng Jiang<sup>1</sup>, Lei Xie<sup>1</sup></center>
<center><sup>1</sup>Audio, Speech and Language Processing Group (ASLP@NPU), School of Computer Science, Northwestern Polytechnical University, Xi'an, China </center>
<center><sup>2</sup>School of Data Science, The Chinese University of Hong Kong, Shenzhen (CUHK-Shenzhen), China </center> -->
</h3>
<!-- <center>Accepted by ASRU 2023</center> -->



<br><br>
Expand All @@ -64,39 +61,43 @@ <h2 id="abstract">1. Abstract<a name="abstract"></a></h2>
embedding obtained from text. Finally, we introduce the context encoder to two typical TTS models, including VITS-based
TTS and language model-based TTS. Experimental results demonstrate that our proposed approach can effectively capture
diverse styles and coherent prosody, and thus improves naturalness and expressiveness in audiobook speech synthesis.</p>

<br><br>
<table frame=void rules=none>
<tr>
<center><img src='raw/fig/overview.png' width="50%"></center>
<center>Overview of our framework</center>
<center><img src='raw/fig/overview.png' width="60%"></center>
<center><span><b>Figure 1. Overview of our framework</b></span></center>
</tr>
<tr><br><br></tr>
<tr><br></tr>
<tr>
<center><img src='raw/fig/vits.png' width="40%"></center>
<center>Text-aware Context-aware VITS </center>
<td>
<center><img src='raw/fig/vits.png' width="80%"></center>
<center><span><b>Figure 2. Text-aware Context-aware VITS (TACA-VITS)</b></span> </center>
</td>
<td>
<center><img src='raw/fig/lm.png' width="70%"></center>
<center><span><b>Figure 3. Text-aware Context-aware Language Model (TACA-LM)</b></span></center>
</tr>
</tr>
<tr><br><br></tr>
<!-- <tr><br></tr>
<tr></tr>
<tr>
<center><img src='raw/fig/lm.png' width="40%"></center>
<center>Text-aware Context-aware LM</center>
</tr>
</tr> -->
</table>
<br><br>


<h2>2. Demos <a name="Comparison"></a></h2>
<p>Methods</p>
<!-- <p>Methods</p>
<ul>
<li><b><a href="https://github.com/fishaudio/Bert-VITS2">VITS : </a></b>a VITS2 with multilingual BERT<sup>1</sup></li>
<li><b>TACA-VITS : </b>A Text-Aware and Context-Aware VITS</li>
<li><b>LM : </b>A LM with Hubert token</li>
<li><b>TACA-VITS : </b> A Text-Aware and Context-Aware</li>
<li><b>TACA-VITS : </b> A Text-Aware and Context-Aware LM-based TTS</li>
</ul>

<h3>Speech</h3>
</ul> -->

<table>
<tbody id="tbody_speech">
</tbody>
Expand Down
Binary file modified raw/fig/overview.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.

0 comments on commit b6c4bda

Please sign in to comment.