Skip to content

xpk (Accelerated Processing Kit, pronounced x-p-k,) is a software tool to help Cloud developers to orchestrate training jobs on accelerators such as TPUs and GPUs on GKE.

License

Notifications You must be signed in to change notification settings

AI-Hypercomputer/xpk

Build Tests Nightly Tests

Overview

XPK (Accelerated Processing Kit, pronounced x-p-k) is a command line interface that simplifies cluster creation and workload execution on Google Kubernetes Engine (GKE). XPK generates preconfigured, training-optimized clusters and allows easy workload scheduling without any Kubernetes expertise.

XPK is recommended for quick creation of GKE clusters for proofs of concepts and testing.

XPK decouples provisioning capacity from running jobs. There are two structures: clusters (provisioned VMs) and workloads (training jobs). Clusters represent the physical resources you have available. Workloads represent training jobs -- at any time some of these will be completed, others will be running and some will be queued, waiting for cluster resources to become available.

The ideal workflow starts by provisioning the clusters for all of the ML hardware you have reserved. Then, without re-provisioning, submit jobs as needed. By eliminating the need for re-provisioning between jobs, using Docker containers with pre-installed dependencies and cross-ahead of time compilation, these queued jobs run with minimal start times. Further, because workloads return the hardware back to the shared pool when they complete, developers can achieve better use of finite hardware resources. And automated tests can run overnight while resources tend to be underutilized.

XPK supports a variety of hardware accelerators.

Accelerator Type Create Cluster Create Workload
TPU Ironwood (tpu7x) [NEW] docs docs
TPU Trillium (v6e) docs docs
TPU v5p docs docs
TPU v5e docs docs
TPU v4 docs docs
GPU A100 docs docs
GPU A3-Highgpu (h100) docs docs
GPU A3-Mega (h100-mega) docs docs
GPU A3-Ultra (h200) docs docs
GPU A4 (b200) docs docs
GPU A4X (gb200) docs docs
CPU n2-standard-32 docs docs

XPK also supports the following Google Cloud Storage solutions:

Storage Type Documentation
Cloud Storage FUSE docs
Filestore docs
Parallelstore docs
Block storage (Persistent Disk, Hyperdisk) docs

Documentation

Contributing

Please read contributing.md for details on our code of conduct, and the process for submitting pull requests to us.

License

This project is licensed under the Apache License 2.0 - see the LICENSE file for details

About

xpk (Accelerated Processing Kit, pronounced x-p-k,) is a software tool to help Cloud developers to orchestrate training jobs on accelerators such as TPUs and GPUs on GKE.

Topics

Resources

License

Code of conduct

Contributing

Stars

Watchers

Forks

Packages

No packages published

Languages