Skip to content

Commit c1619b1

Browse files
committed
evalperf: add o1-mini
1 parent 600e571 commit c1619b1

File tree

5 files changed

+1270
-1052
lines changed

5 files changed

+1270
-1052
lines changed

evalperf.html

Lines changed: 5 additions & 9 deletions
Original file line numberDiff line numberDiff line change
@@ -126,6 +126,9 @@ <h1 class="text-nowrap mt-5" style="font-size: xx-large;">
126126

127127
<div class="container d-flex flex-column align-items-center gap-3 mt-5">
128128
<h3>Win-rate Leaderboard</h3>
129+
<p>📝 Notes: the default prompt does not emphasize efficiency requirements as our work shows such emphasis
130+
might degrade both efficiency and correctness for some weak models. Yet, "(🔥)" marks models using
131+
performance-encouraging prompts as they might be able to accurately understand such needs.</p>
129132
<table id="leaderboard"
130133
class="table table-responsive table-striped table-bordered flex-shrink-1 border border-5">
131134
</table>
@@ -233,15 +236,10 @@ <h2 id="sponsor" class="text-nowrap mt-5">🤗 Acknowledgment</h2>
233236
modelId = modelId[1];
234237
url = hfLinkPrefix + modelOrg + "/" + modelId;
235238
linkMapping.set(modelId, url);
236-
} else if (modelId.startsWith("gpt-4-")) {
239+
} else if (modelId.startsWith("o1-") || modelId.startsWith("gpt-")) {
237240
linkMapping.set(
238241
modelId,
239-
"https://platform.openai.com/docs/models/gpt-4-turbo-and-gpt-4",
240-
);
241-
} else if (modelId.startsWith("gpt-3.5-")) {
242-
linkMapping.set(
243-
modelId,
244-
"https://platform.openai.com/docs/models/gpt-3-5-turbo",
242+
"https://platform.openai.com/docs/models",
245243
);
246244
} else if (modelId.startsWith("claude-3-")) {
247245
linkMapping.set(
@@ -258,8 +256,6 @@ <h2 id="sponsor" class="text-nowrap mt-5">🤗 Acknowledgment</h2>
258256
modelId,
259257
"https://deepmind.google/technologies/gemini/flash/",
260258
);
261-
} else if (modelId.startsWith("gpt-4o-")) {
262-
linkMapping.set(modelId, "https://openai.com/index/hello-gpt-4o/");
263259
} else if (modelId.startsWith("deepseek-chat")) {
264260
linkMapping.set(modelId, "https://chat.deepseek.com/")
265261
} else if (modelId == "heatmap_data") {

leaderboard.html

Lines changed: 0 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -170,7 +170,6 @@ <h3>📝 Notes</h3>
170170
</div>
171171
</div>
172172
<div id="notes">
173-
<h3 id="sponsor" class="text-nowrap mt-5">🖊️ Citation</h3>
174173
<h3>🤗 More Leaderboards</h3>
175174
In addition to EvalPlus leaderboards, it is recommended to
176175
comprehensively understand LLM coding ability through a diverse set of

0 commit comments

Comments
 (0)