Skip to content

Commit

Permalink
Scenario Demo
Browse files Browse the repository at this point in the history
  • Loading branch information
raghav-2002-os committed Dec 16, 2024
1 parent 6ffa7e3 commit dab63a0
Showing 1 changed file with 149 additions and 1 deletion.
150 changes: 149 additions & 1 deletion docs/index.html
Original file line number Diff line number Diff line change
Expand Up @@ -101,6 +101,17 @@
background-color: #f1f1f1;
display: none;
}
table {
width: 100%;
border-collapse: collapse;
}
table, th, td {
border: 1px solid black;
}
th, td {
padding: 8px;
text-align: left;
}

.btn {
display: flex; /* Use flexbox for layout */
Expand Down Expand Up @@ -584,7 +595,144 @@ <h2 class="title is-3">Results</h2>
<img src="./images/prompting.png" >
</p>
<p align="center"><b>Substantial performance improvements across all models when optimized prompts are generated by PromptWizard on GSM8k dataset</b></p>
</div>
</div>
<br>
<button class="btn" onclick="toggleContent(this,'11')">Comparision with Feedback based and other Prompt Optimization Techniques<span class="icon">+</span></button>
<div class="col_content_11">
<p align="center">
<table>
<tr>
<td>Dataset</td>
<td colspan="4">Accuracy (high)</td>
</tr>
<tr>
<td></td>
<td>DSPy</td>
<td>PromptAgent </td>
<td>APO</td>
<td>PW</td>
</tr>
<tr>
<td>GSM8k</td>
<td>78.2</td>
<td>68.84</td>
<td>25.67</td>
<td><b>90</b></td>
</tr>
<tr>
<td>AQUARAT</td>
<td>55.1</td>
<td>56.67</td>
<td>20.12</td>
<td><b>58.2</b></td>
</tr>
<tr>
<td>SVAMP</td>
<td>77</td>
<td>78.67</td>
<td>75.25</td>
<td><b>82.3</b></td>
</tr>
<tr>
<td>ETHOS</td>
<td>84.1</td>
<td>84.25</td>
<td>80.62</td>
<td><b>89.4</b></td>
</tr>
</table>
<br>
<table>
<tr>
<td>Dataset</td>
<td colspan="4">Calls (low)</td>
</tr>
<tr>
<td></td>
<td>DSPy</td>
<td>PromptAgent </td>
<td>APO</td>
<td>PW</td>
</tr>
<tr>
<td>GSM8k</td>
<td>915</td>
<td>2115</td>
<td>8490</td>
<td><b>147</b></td>
</tr>
<tr>
<td>AQUARAT</td>
<td>920</td>
<td>2200</td>
<td>8500</td>
<td><b>112</b></td>
</tr>
<tr>
<td>SVAMP</td>
<td>2300</td>
<td>2111</td>
<td>8000</td>
<td><b>178</b></td>
</tr>
<tr>
<td>ETHOS</td>
<td>660</td>
<td>2217</td>
<td>8200</td>
<td><b>80</b></td>
</tr>
</table>
<br>
<table>
<tr>
<td>Dataset</td>
<td colspan="4">Tokens (low)</td>
</tr>
<tr>
<td></td>
<td>DSPy</td>
<td>PromptAgent </td>
<td>APO</td>
<td>PW</td>
</tr>
<tr>
<td>GSM8k</td>
<td>262</td>
<td>500</td>
<td><b>109</b></td>
<td>237</td>
</tr>
<tr>
<td>AQUARAT</td>
<td>326</td>
<td>875</td>
<td><b>125</b></td>
<td>200</td>
</tr>
<tr>
<td>SVAMP</td>
<td>189</td>
<td>680</td>
<td><b>85</b></td>
<td>127</td>
</tr>
<tr>
<td>ETHOS</td>
<td>175</td>
<td>417</td>
<td><b>55</b></td>
<td>190</td>
</tr>
</table>
</p>
<br>
<p align="center"> <b>PromptWizard outperforms feedback based methods like APO, PromptAgent and other prompt optimization techniques like DSPy in terms of accuracy and number of API calls for optimization on various datasets. For the case of
number of average tokens per call, PromptWizard uses the second least number in most cases and is only behind APO which being a techinque designed for only binary classification tasks generates smaller sized prompts (hence uses fewer tokens) and is not extensible to
other tasks.
</p>
</b>
</div>
</div>
</div>
</div>
Expand Down

0 comments on commit dab63a0

Please sign in to comment.