v0.0.7
What's Changed
- Evals: correctly pass temperature/max_tokens when using Responses API by @Maratyszcza in #174
- Metal: move sampling to GPU by @Maratyszcza in #175
- Metal: benchmark generation of 100 tokens instead of 1 by @Maratyszcza in #178
- Metal: support generating multiple tokens at once by @Maratyszcza in #179
- Adding prefill benchmarking for metal backend by @ibahmed-oai in #181
- Metal: tune threadgroup sizes by @Maratyszcza in #180
- Metal: Adding optimized dense matmul kernel to optimize prefill perf by @ibahmed-oai in #183
- Metal: fused QKV projection (matmul+RoPE) kernel by @Maratyszcza in #184
- [Bugfix]Capture stderr for python tool with uv as backend by @wuhang2014 in #182
New Contributors
- @ibahmed-oai made their first contribution in #181
- @wuhang2014 made their first contribution in #182
Full Changelog: v0.0.6...v0.0.7