|
13 | 13 |
|
14 | 14 | Welcome to the gpt-oss series, [OpenAI's open-weight models](https://openai.com/open-models/) designed for powerful reasoning, agentic tasks, and versatile developer use cases.
|
15 | 15 |
|
16 |
| -We're releasing two flavors of the open models: |
| 16 | +We're releasing two flavors of these open models: |
17 | 17 |
|
18 |
| -- `gpt-oss-120b` — for production, general purpose, high reasoning use cases that fits into a single H100 GPU (117B parameters with 5.1B active parameters) |
| 18 | +- `gpt-oss-120b` — for production, general purpose, high reasoning use cases that fit into a single H100 GPU (117B parameters with 5.1B active parameters) |
19 | 19 | - `gpt-oss-20b` — for lower latency, and local or specialized use cases (21B parameters with 3.6B active parameters)
|
20 | 20 |
|
21 | 21 | Both models were trained on our [harmony response format][harmony] and should only be used with the harmony format as it will not work correctly otherwise.
|
@@ -76,7 +76,7 @@ vllm serve openai/gpt-oss-20b
|
76 | 76 |
|
77 | 77 | [Learn more about how to use gpt-oss with vLLM.](https://cookbook.openai.com/articles/gpt-oss/run-vllm)
|
78 | 78 |
|
79 |
| -#### Pytorch / Triton / Metal |
| 79 | +#### PyTorch / Triton / Metal |
80 | 80 |
|
81 | 81 | These implementations are largely reference implementations for educational purposes and are not expected to be run in production.
|
82 | 82 |
|
@@ -116,14 +116,14 @@ Check out our [awesome list](./awesome-gpt-oss.md) for a broader collection of g
|
116 | 116 | This repository provides a collection of reference implementations:
|
117 | 117 |
|
118 | 118 | - **Inference:**
|
119 |
| - - [`torch`](#reference-pytorch-implementation) — a non-optimized [Pytorch](https://pytorch.org/) implementation for educational purposes only. Requires at least 4x H100s because it's not optimized |
120 |
| - - [`triton`](#reference-triton-implementation-single-gpu) — a more optimized implementation using [Pytorch](https://pytorch.org/) & [Triton](https://github.com/triton-lang/triton) incl. using CUDA graphs and basic caching |
| 119 | + - [`torch`](#reference-pytorch-implementation) — a non-optimized [PyTorch](https://pytorch.org/) implementation for educational purposes only. Requires at least 4x H100s because it's not optimized |
| 120 | + - [`triton`](#reference-triton-implementation-single-gpu) — a more optimized implementation using [PyTorch](https://pytorch.org/) & [Triton](https://github.com/triton-lang/triton) incl. using CUDA graphs and basic caching |
121 | 121 | - [`metal`](#reference-metal-implementation) — a Metal-specific implementation for running the models on Apple Silicon hardware
|
122 | 122 | - **Tools:**
|
123 | 123 | - [`browser`](#browser) — a reference implementation of the browser tool the models got trained on
|
124 | 124 | - [`python`](#python) — a stateless reference implementation of the python tool the model got trained on
|
125 | 125 | - **Client examples:**
|
126 |
| - - [`chat`](#terminal-chat) — a basic terminal chat application that uses the Pytorch or Triton implementations for inference along with the python and browser tools |
| 126 | + - [`chat`](#terminal-chat) — a basic terminal chat application that uses the PyTorch or Triton implementations for inference along with the python and browser tools |
127 | 127 | - [`responses_api`](#responses-api) — an example Responses API compatible server that implements the browser tool along with other Responses-compatible functionality
|
128 | 128 |
|
129 | 129 | ## Setup
|
@@ -212,7 +212,7 @@ If you encounter `torch.OutOfMemoryError` make sure to turn on the expandable al
|
212 | 212 |
|
213 | 213 | ## Reference Metal implementation
|
214 | 214 |
|
215 |
| -Additionally we are providing a reference implementation for Metal to run on Apple Silicon. This implementation is not production ready but is accurate to the Pytorch implementation. |
| 215 | +Additionally we are providing a reference implementation for Metal to run on Apple Silicon. This implementation is not production-ready but is accurate to the PyTorch implementation. |
216 | 216 |
|
217 | 217 | The implementation will get automatically compiled when running the `.[metal]` installation on an Apple Silicon device:
|
218 | 218 |
|
@@ -248,7 +248,7 @@ We also include two system tools for the model: browsing and python container. C
|
248 | 248 |
|
249 | 249 | ### Terminal Chat
|
250 | 250 |
|
251 |
| -The terminal chat application is a basic example on how to use the harmony format together with the Pytorch, Triton, and vLLM implementations. It also exposes both the python and browser tool as optional tools that can be used. |
| 251 | +The terminal chat application is a basic example on how to use the harmony format together with the PyTorch, Triton, and vLLM implementations. It also exposes both the python and browser tool as optional tools that can be used. |
252 | 252 |
|
253 | 253 | ```bash
|
254 | 254 | usage: python -m gpt_oss.chat [-h] [-r REASONING_EFFORT] [-a] [-b] [--show-browser-results] [-p] [--developer-message DEVELOPER_MESSAGE] [-c CONTEXT] [--raw] [--backend {triton,torch,vllm}] FILE
|
@@ -402,7 +402,7 @@ To improve performance the tool caches requests so that the model can revisit a
|
402 | 402 |
|
403 | 403 | ### Python
|
404 | 404 |
|
405 |
| -The model got trained on using a python tool to perform calculations and other actions as part of its chain-of-thought. During the training the model used a stateful tool which makes running tools between CoT loops easier. This reference implementation, however, uses a stateless mode. As a result the PythonTool defines its own tool description to override the definition in [`openai-harmony`][harmony]. |
| 405 | +The model was trained to use using a python tool to perform calculations and other actions as part of its chain-of-thought. During the training the model used a stateful tool which makes running tools between CoT loops easier. This reference implementation, however, uses a stateless mode. As a result the PythonTool defines its own tool description to override the definition in [`openai-harmony`][harmony]. |
406 | 406 |
|
407 | 407 | > [!WARNING]
|
408 | 408 | > This implementation runs in a permissive Docker container which could be problematic in cases like prompt injections. It's serving as an example and you should consider implementing your own container restrictions in production.
|
@@ -436,7 +436,7 @@ if use_python_tool:
|
436 | 436 | system_message = Message.from_role_and_content(Role.SYSTEM, system_message_content)
|
437 | 437 |
|
438 | 438 | # create the overall prompt
|
439 |
| -messages = [system_message, Message.from_role_and_content(Role.USER, "What's the squareroot of 9001?")] |
| 439 | +messages = [system_message, Message.from_role_and_content(Role.USER, "What's the square root of 9001?")] |
440 | 440 | conversation = Conversation.from_messages(messages)
|
441 | 441 |
|
442 | 442 | # convert to tokens
|
|
0 commit comments