Skip to content

Commit 087d3f8

Browse files
authored
Merge pull request #76 from karan/llama4-maxtext-readme
llama4-scout-maxtext: add note about KSA permissions
2 parents 27e2e15 + 0f15b38 commit 087d3f8

File tree

1 file changed

+19
-9
lines changed
  • inference/trillium/JetStream-Maxtext/Llama-4-Scout-17B-16E

1 file changed

+19
-9
lines changed

inference/trillium/JetStream-Maxtext/Llama-4-Scout-17B-16E/README.md

Lines changed: 19 additions & 9 deletions
Original file line numberDiff line numberDiff line change
@@ -58,7 +58,7 @@ GKE creates the following resources for the recipe:
5858
Before running this recipe, ensure your environment is configured as follows:
5959

6060
- A GKE cluster with the following setup:
61-
- A TPU Trillium node pool with a `ct6e-standard-8t` machine type.
61+
- A TPU Trillium node pool with a `ct6e-standard-8t` machine type.
6262
- Topology-aware scheduling enabled
6363
- An Artifact Registry repository to store the Docker image.
6464
- A Google Cloud Storage (GCS) bucket to store results.
@@ -67,15 +67,14 @@ Before running this recipe, ensure your environment is configured as follows:
6767
- Google Cloud SDK
6868
- Helm
6969
- kubectl
70-
- To access the [Llama-4-Scout-17B-16E model](https://huggingface.co/meta-llama/Llama-4-Scout-17B-16E) through Hugging Face, you'll need a Hugging Face token. Ensure that you also sign the community license agreement and get gated access to the Meta models. Follow these steps to generate a new token if you don't have one already:
70+
- To access the [Llama-4-Scout-17B-16E model](https://huggingface.co/meta-llama/Llama-4-Scout-17B-16E) through Hugging Face, you'll need a Hugging Face token. **Ensure that you also sign the community license agreement and get gated access to the Meta models**. Follow these steps to generate a new token if you don't have one already:
7171
- Create a [Hugging Face account](https://huggingface.co/), if you don't already have one.
7272
- Click Your **Profile > Settings > Access Tokens**.
7373
- Select **New Token**.
7474
- Specify a Name and a Role of at least `Read`.
7575
- Select **Generate a token**.
7676
- Copy the generated token to your clipboard.
7777

78-
7978
## Run the recipe
8079

8180
### Launch Cloud Shell
@@ -135,6 +134,17 @@ From your client, get the credentials for your cluster.
135134
gcloud container clusters get-credentials $CLUSTER_NAME --region $CLUSTER_REGION
136135
```
137136

137+
### Grant access to your GCS bucket from the KSA
138+
139+
```
140+
gcloud storage buckets add-iam-policy-binding gs://<GCS_BUCKET> \
141+
--role=roles/storage.objectViewer \
142+
--member=principal://iam.googleapis.com/projects/630405687483/locations/global/workloadIdentityPools/<PROJECT_ID>.svc.id.goog/subject/ns/default/sa/default \
143+
--condition=None
144+
```
145+
146+
Also note that [xpk](https://github.com/AI-Hypercomputer/xpk/) does not configure Workload Identity Federation. If you use xpk, follow [these instructions](https://cloud.google.com/kubernetes-engine/docs/how-to/workload-identity#enable-existing-cluster) to enable it.
147+
138148
### Build and push a docker container image to Artifact Registry
139149

140150
To build the container, complete the following steps from your client:
@@ -189,7 +199,7 @@ The recipe uses the helm chart to run the above steps.
189199
--dry-run=client -o yaml | kubectl apply -f -
190200
```
191201

192-
2. Convert the checkpoint from PyTorch to Orbax
202+
2. Convert the checkpoint from PyTorch to Orbax
193203
This job converts the checkpoint from PyTorch format to JAX Orbax format and unscans it for performant serving. This unscanned checkpoint is then stored in the mounted GCS bucket so that it can be used by the TPU nodepool to bring up the JetStream serve in the next step.
194204

195205
```bash
@@ -203,13 +213,13 @@ The recipe uses the helm chart to run the above steps.
203213
$USER-serving-llama4-model \
204214
$RECIPE_ROOT/prepare-model
205215
```
206-
216+
207217
Run the following to verify if the job has been completed.
208218
```bash
209219
kubectl get job/$USER-serving-llama4-model-convert-ckpt
210220
211221
NAME STATUS COMPLETIONS DURATION AGE
212-
user-serving-llama4-model-convert-ckpt Running 1/1 26m 26m
222+
user-serving-llama4-model-convert-ckpt Running 1/1 26m 26m
213223
```
214224

215225
Uninstall the helm chart once done
@@ -279,9 +289,9 @@ The recipe uses the helm chart to run the above steps.
279289
--save-result \
280290
--request-outputs-file-path mmlu_outputs.json
281291
```
282-
292+
283293
5. Stop the server and clean up the resources after completion by following the steps in the [Cleanup](#cleanup) section.
284-
294+
285295
286296
### Cleanup
287297
@@ -297,4 +307,4 @@ To clean up the resources created by this recipe, complete the following steps:
297307
298308
```bash
299309
kubectl delete secret hf-secret
300-
```
310+
```

0 commit comments

Comments
 (0)