You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: inference/trillium/JetStream-Maxtext/Llama-4-Scout-17B-16E/README.md
+19-9Lines changed: 19 additions & 9 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -58,7 +58,7 @@ GKE creates the following resources for the recipe:
58
58
Before running this recipe, ensure your environment is configured as follows:
59
59
60
60
- A GKE cluster with the following setup:
61
-
- A TPU Trillium node pool with a `ct6e-standard-8t` machine type.
61
+
- A TPU Trillium node pool with a `ct6e-standard-8t` machine type.
62
62
- Topology-aware scheduling enabled
63
63
- An Artifact Registry repository to store the Docker image.
64
64
- A Google Cloud Storage (GCS) bucket to store results.
@@ -67,15 +67,14 @@ Before running this recipe, ensure your environment is configured as follows:
67
67
- Google Cloud SDK
68
68
- Helm
69
69
- kubectl
70
-
- To access the [Llama-4-Scout-17B-16E model](https://huggingface.co/meta-llama/Llama-4-Scout-17B-16E) through Hugging Face, you'll need a Hugging Face token. Ensure that you also sign the community license agreement and get gated access to the Meta models. Follow these steps to generate a new token if you don't have one already:
70
+
- To access the [Llama-4-Scout-17B-16E model](https://huggingface.co/meta-llama/Llama-4-Scout-17B-16E) through Hugging Face, you'll need a Hugging Face token. **Ensure that you also sign the community license agreement and get gated access to the Meta models**. Follow these steps to generate a new token if you don't have one already:
71
71
- Create a [Hugging Face account](https://huggingface.co/), if you don't already have one.
72
72
- Click Your **Profile > Settings > Access Tokens**.
73
73
- Select **New Token**.
74
74
- Specify a Name and a Role of at least `Read`.
75
75
- Select **Generate a token**.
76
76
- Copy the generated token to your clipboard.
77
77
78
-
79
78
## Run the recipe
80
79
81
80
### Launch Cloud Shell
@@ -135,6 +134,17 @@ From your client, get the credentials for your cluster.
Also note that [xpk](https://github.com/AI-Hypercomputer/xpk/) does not configure Workload Identity Federation. If you use xpk, follow [these instructions](https://cloud.google.com/kubernetes-engine/docs/how-to/workload-identity#enable-existing-cluster) to enable it.
147
+
138
148
### Build and push a docker container image to Artifact Registry
139
149
140
150
To build the container, complete the following steps from your client:
@@ -189,7 +199,7 @@ The recipe uses the helm chart to run the above steps.
189
199
--dry-run=client -o yaml | kubectl apply -f -
190
200
```
191
201
192
-
2. Convert the checkpoint from PyTorch to Orbax
202
+
2. Convert the checkpoint from PyTorch to Orbax
193
203
This job converts the checkpoint from PyTorch format to JAX Orbax format and unscans it forperformant serving. This unscanned checkpoint is then storedin the mounted GCS bucket so that it can be used by the TPU nodepool to bring up the JetStream serve in the next step.
194
204
195
205
```bash
@@ -203,13 +213,13 @@ The recipe uses the helm chart to run the above steps.
203
213
$USER-serving-llama4-model \
204
214
$RECIPE_ROOT/prepare-model
205
215
```
206
-
216
+
207
217
Run the following to verify if the job has been completed.
208
218
```bash
209
219
kubectl get job/$USER-serving-llama4-model-convert-ckpt
0 commit comments