We read every piece of feedback, and take your input very seriously.
To see all available qualifiers, see our documentation.
There was an error while loading. Please reload this page.
1 parent c1619b1 commit b0ab0c3Copy full SHA for b0ab0c3
evalperf.html
@@ -132,7 +132,12 @@ <h3>Win-rate Leaderboard</h3>
132
<table id="leaderboard"
133
class="table table-responsive table-striped table-bordered flex-shrink-1 border border-5">
134
</table>
135
+ <p>🏪 The detailed model generation data and results are available at our page <a
136
+ href="https://github.com/evalplus/evalplus.github.io/tree/main/results/evalperf">repository</a>.</p>
137
+ <p>💸 We use 50 samples (half) for o1 model series for cost saving; also because it's easy to sample desired
138
+ amount of correct samples from strong models using less tries.</p>
139
140
+ <br>
141
<h3>Heatmap of Pairwise DPS Comparison</h3>
142
<div class="row w-100">
143
<div class="col-12">
0 commit comments