Skip to content

Commit 25bef9f

Browse files
committed
Add API kind overview docs (#2230)
(cherry picked from commit 6a7d6f6)
1 parent 95fb0c6 commit 25bef9f

File tree

10 files changed

+99
-29
lines changed

10 files changed

+99
-29
lines changed

docs/overview.md

Lines changed: 34 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,34 @@
1+
# Overview
2+
3+
## Cluster
4+
5+
The Cortex cluster is an EKS (Kubernetes) cluster in a dedicated VPC on your AWS account.
6+
7+
### Worker node groups
8+
9+
The kubernetes cluster uses EC2 autoscaling groups for its worker node groups. Cortex supports most EC2 instance types, and the necessary device drivers are installed to expose GPUs and Inferentia chips to your workloads. Reserved and spot instances can be used to help reduce costs.
10+
11+
Cortex uses the Kubernetes Cluster Autoscaler to scale the appropriate node groups to satisfy the compute demands of your workloads.
12+
13+
### Networking
14+
15+
By default, a new dedicated VPC is created for the cluster during installation.
16+
17+
Two network loadbalancers (NLBs) are created to route traffic to the cluster. One loadbalancer is dedicated for traffic to your APIs, and the other loadbalancer is dedicated for API management requests to Cortex from your CLI or Python client. Traffic to the loadbalancers can be secured and restricted based on your cluster configuration.
18+
19+
### Observability
20+
21+
All logs from the Cortex cluster are pushed to a CloudWatch log group using FluentBit. An in-cluster Prometheus installation is used to collect metrics for observability and autoscaling purposes. Metrics and dashboards pertaining to your APIs and instance usage can be viewed and modified via Grafana.
22+
23+
## Deploying to the cluster
24+
25+
After a successful Cortex cluster installation, you can use the Cortex CLI or Python Client to deploy different types of workloads. The clients use AWS credentials to authenticate to the Cortex cluster.
26+
27+
Cortex uses a collection of containers, referred to as a pod, as the atomic unit; scaling and replication occurs at the pod level. The orchestration and scaling of pods is unique to the different types of workloads:
28+
29+
* Realtime
30+
* Async
31+
* Batch
32+
* Task
33+
34+
Visit the workload-specific documentation for more details.

docs/summary.md

Lines changed: 5 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,7 @@
11
# Summary
22

33
* [Get started](start.md)
4+
* [Overview](overview.md)
45

56
## Clusters
67

@@ -29,7 +30,7 @@
2930

3031
## Workloads
3132

32-
* [Realtime APIs](workloads/realtime/realtime-apis.md)
33+
* [Realtime](workloads/realtime/realtime.md)
3334
* [Example](workloads/realtime/example.md)
3435
* [Configuration](workloads/realtime/configuration.md)
3536
* [Containers](workloads/realtime/containers.md)
@@ -38,18 +39,18 @@
3839
* [Metrics](workloads/realtime/metrics.md)
3940
* [Statuses](workloads/realtime/statuses.md)
4041
* [Troubleshooting](workloads/realtime/troubleshooting.md)
41-
* [Async APIs](workloads/async/async-apis.md)
42+
* [Async](workloads/async/async.md)
4243
* [Example](workloads/async/example.md)
4344
* [Configuration](workloads/async/configuration.md)
4445
* [Containers](workloads/async/containers.md)
4546
* [Statuses](workloads/async/statuses.md)
46-
* [Batch APIs](workloads/batch/batch-apis.md)
47+
* [Batch](workloads/batch/batch.md)
4748
* [Example](workloads/batch/example.md)
4849
* [Configuration](workloads/batch/configuration.md)
4950
* [Containers](workloads/batch/containers.md)
5051
* [Jobs](workloads/batch/jobs.md)
5152
* [Statuses](workloads/batch/statuses.md)
52-
* [Task APIs](workloads/task/task-apis.md)
53+
* [Task](workloads/task/task.md)
5354
* [Example](workloads/task/example.md)
5455
* [Configuration](workloads/task/configuration.md)
5556
* [Containers](workloads/task/containers.md)

docs/workloads/async/async-apis.md

Lines changed: 0 additions & 16 deletions
This file was deleted.

docs/workloads/async/async.md

Lines changed: 15 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,15 @@
1+
# Async
2+
3+
Async APIs are designed for asynchronous workloads in which the user submits an asynchronous request and retrieves the result later (either by polling or through a webhook).
4+
5+
Async APIs are a good fit for users who want to submit longer workloads (such as video, audio or document processing), and do not need the result immediately or synchronously.
6+
7+
## How it works
8+
9+
When you deploy an AsyncAPI, Cortex creates an SQS queue, a pool of Async Gateway workers, and a pool of workers running your containers.
10+
11+
The Async Gateway is responsible for submitting the workloads to the queue and for retrieving workload statuses and results. Cortex fully implements and manages the Async Gateway and the queue.
12+
13+
The pool of workers running your containers autoscales based on the average number of messages in the queue and can scale down to 0 (if configured to do so).
14+
15+
![](https://user-images.githubusercontent.com/7456627/111491999-9b67f100-873c-11eb-87f0-effcf4aab01b.png)

docs/workloads/batch/batch-apis.md

Lines changed: 0 additions & 3 deletions
This file was deleted.

docs/workloads/batch/batch.md

Lines changed: 19 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,19 @@
1+
# Batch
2+
3+
Batch APIs run distributed and fault-tolerant batch processing jobs on demand.
4+
5+
Batch APIs are a good fit for users who want to break up their workloads and distribute them across a dedicated pool of workers (for example, running inference on a set of images).
6+
7+
## How it works
8+
9+
When you deploy a Batch API, Cortex creates an endpoint to receive job submissions.
10+
11+
Upon job submission, Cortex responds with a Job ID, and asynchronously triggers a Batch Job.
12+
13+
First, Cortex deploys an enqueuer, which breaks up the data in the job into batches and pushes them onto an SQS FIFO queue.
14+
15+
After enqueuing is complete, Cortex initializes the requested number of worker pods and attaches a dequeuer sidecar to each pod. The dequeuer is responsible for retrieving batches from the queue and making an http request to your pod for each batch.
16+
17+
After the worker pods have emptied the queue, the job is marked as complete, and Cortex will terminate the worker pods and delete the SQS queue.
18+
19+
You can make GET requests to the BatchAPI endpoint to get the status of the Job and metrics such as the number of batches completed and failed.

docs/workloads/realtime/realtime-apis.md

Lines changed: 0 additions & 3 deletions
This file was deleted.

docs/workloads/realtime/realtime.md

Lines changed: 11 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,11 @@
1+
# Realtime
2+
3+
Realtime APIs respond to requests synchronously and autoscale based on in-flight request volumes.
4+
5+
Realtime APIs are a good fit for users who want to run stateless containers as a scalable microservice (for example, deploying machine learning models as APIs).
6+
7+
## How it works
8+
9+
When you deploy a Realtime API, Cortex initializes a pool of worker pods and attaches a proxy sidecar to each of the pods.
10+
11+
The proxy is responsible for receiving incoming requests, queueing them (if necessary), and forwarding them to your pod when it is ready. Autoscaling is based on aggregate in-flight request volume, which is published by the proxy sidecars.

docs/workloads/task/task-apis.md

Lines changed: 0 additions & 3 deletions
This file was deleted.

docs/workloads/task/task.md

Lines changed: 15 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,15 @@
1+
# Task
2+
3+
Task APIs provide a lambda-style execution of containers. They are useful for running your containers on demand.
4+
5+
Task APIs are a good fit when you need to trigger container execution via an HTTP request. They can be used to run tasks (e.g. training models), and can be configured as task runners for orchestrators (such as airflow).
6+
7+
## How it works
8+
9+
When you deploy a Task API, an endpoint is created to receive task submissions.
10+
11+
Upon submitting a Task, Cortex will respond with a Task ID and will asynchronously trigger the execution of a Task.
12+
13+
Cortex will initialize one or more worker pods based on your API specification. After the worker pod(s) run to completion, the Task is marked as completed and the worker pod(s) are terminated.
14+
15+
You can make GET requests to the Task API endpoint to retreive the status of the Task.

0 commit comments

Comments
 (0)