Skip to content

Commit a0413eb

Browse files
togashidmtmetschPaulina-OsikoyaobiTrinobiIntel
authored
Release 0.2.0 (#183)
- Release a configurable CPU vertical scaling actuator. - CPU vertical scaling actuator as plugin. - Adoption of bi-direction gRPC stream on NextState-Planner comm. - Improvements on the analytical scripts for example usage. - Enhancements to RDT/DRC actuator. - Bump to k8s 1.26 dependencies and version CI tools. - Update and optimize the container base images. - PodState refactored to allow a better access to resources/annotation. - Many security improvements at container level. - Documentation updates. Co-authored-by: tmetsch <tmetsch@users.noreply.github.com> Co-authored-by: togashidm <togashidm@users.noreply.github.com> Co-authored-by: Paulina-Osikoya <Paulina-Osikoya@users.noreply.github.com> Co-authored-by: obiTrinobiIntel <obiTrinobiIntel@users.noreply.github.com>
1 parent 443c316 commit a0413eb

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

67 files changed

+3633
-1202
lines changed

.dockerignore

Lines changed: 19 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,19 @@
1+
_attic/
2+
vendor/
3+
bin/
4+
# coverage outputs etc.
5+
*.out
6+
*.dot*
7+
coverage.html
8+
# IDE related stuff.
9+
.vscode/
10+
.idea/
11+
# profiling
12+
*.test
13+
*.profile
14+
*.cpuprofile
15+
# in-tree folders not used for build process
16+
bin
17+
docs
18+
.git
19+
.github

.github/workflows/sca.yml

Lines changed: 12 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -4,6 +4,8 @@ on:
44
branches: [ '**' ]
55
pull_request:
66
branches: [ '**' ]
7+
permissions:
8+
contents: read
79
jobs:
810
shellcheck:
911
name: Shellcheck
@@ -20,19 +22,22 @@ jobs:
2022
name: Hadolint
2123
steps:
2224
- uses: actions/checkout@v3
23-
- run: wget -q https://github.com/hadolint/hadolint/releases/download/v2.10.0/hadolint-Linux-x86_64 -O hadolint; chmod +x hadolint ; find . -type f \( -name "Dockerfile*" \) -print0 | xargs -n 1 -0 ./hadolint ;
25+
- run: wget -q https://github.com/hadolint/hadolint/releases/download/v2.12.0/hadolint-Linux-x86_64 -O hadolint; chmod +x hadolint ; find . -type f \( -name "Dockerfile*" \) -print0 | xargs -n 1 -0 ./hadolint ;
2426
gofmt-imports:
2527
runs-on: ubuntu-latest
2628
name: Go Fmt and Go Import
2729
steps:
2830
- uses: actions/checkout@v3
2931
- uses: actions/setup-go@v3
3032
with:
31-
go-version-file: 'go.mod'
33+
go-version: 1.19
3234
- run: |
33-
go install golang.org/x/tools/cmd/goimports@v0.1.12 && goimports -l . && gofmt -l .
35+
go install golang.org/x/tools/cmd/goimports@v0.6.0 && goimports -l . && gofmt -l .
3436
shell: bash
3537
golangci:
38+
permissions:
39+
contents: read
40+
pull-requests: read
3641
runs-on: ubuntu-latest
3742
name: lint
3843
steps:
@@ -41,8 +46,7 @@ jobs:
4146
go-version: 1.19
4247
- uses: actions/checkout@v3
4348
- name: golangci-lint
44-
uses: golangci/golangci-lint-action@v3
45-
with:
46-
version: v1.50.0
47-
# Additional linting tools can be added here
48-
args: --enable=revive,errcheck,goimports,govet,nilerr,gosec --timeout=5m
49+
run: |
50+
go install github.com/golangci/golangci-lint/cmd/golangci-lint@v1.51.2
51+
make golangci-lint
52+
shell: bash

.github/workflows/test-build.yml

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -4,6 +4,8 @@ on:
44
branches: [ '**' ]
55
pull_request:
66
branches: [ '**' ]
7+
permissions:
8+
contents: read
79
jobs:
810
build:
911
runs-on: ubuntu-latest

Dockerfile

Lines changed: 2 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -8,12 +8,10 @@ WORKDIR /app
88
COPY . ./
99

1010
RUN make prepare-build build \
11-
&& go run github.com/google/go-licenses@v1.3.1 save "./..." --save_path licenses \
11+
&& go run github.com/google/go-licenses@v1.6.0 save "./..." --save_path licenses \
1212
&& hack/additional-licenses.sh
1313

14-
FROM alpine:3.16
15-
16-
RUN adduser -D nonroot
14+
FROM scratch
1715

1816
WORKDIR /app
1917

Makefile

Lines changed: 17 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -2,8 +2,9 @@ BINARY_NAME=planner
22
SCALEOUT_PLUGIN=scale_out
33
RMPOD_PLUGIN=rm_pod
44
RDT_PLUGIN=rdt
5+
CPU_PLUGIN=cpu_scale
56
GO_CILINT_CHECKERS=errcheck,goimports,gosec,gosimple,govet,ineffassign,nilerr,revive,staticcheck,unused
6-
DOCKER_IMAGE_VERSION=0.1.1
7+
DOCKER_IMAGE_VERSION=0.2.0
78

89
api:
910
hack/generate_code.sh
@@ -19,18 +20,28 @@ gen_code: api proto
1920
build:
2021
CGO_ENABLED=0 go build -o bin/${BINARY_NAME} cmd/main.go
2122

22-
build-plugins:
23+
build-plugin-scaleout:
2324
CGO_ENABLED=0 go build -o bin/plugins/${SCALEOUT_PLUGIN} plugins/${SCALEOUT_PLUGIN}/cmd/${SCALEOUT_PLUGIN}.go
25+
26+
build-plugin-rmpod:
2427
CGO_ENABLED=0 go build -o bin/plugins/${RMPOD_PLUGIN} plugins/${RMPOD_PLUGIN}/cmd/${RMPOD_PLUGIN}.go
28+
29+
build-plugin-rdt:
2530
CGO_ENABLED=0 go build -o bin/plugins/${RDT_PLUGIN} plugins/${RDT_PLUGIN}/cmd/${RDT_PLUGIN}.go
2631

32+
build-plugin-cpu:
33+
CGO_ENABLED=0 go build -o bin/plugins/${CPU_PLUGIN} plugins/${CPU_PLUGIN}/cmd/${CPU_PLUGIN}.go
34+
35+
build-plugins: build-plugin-scaleout build-plugin-rmpod build-plugin-rdt build-plugin-cpu
36+
2737
controller-images:
28-
docker build -t planner:${DOCKER_IMAGE_VERSION} .
38+
docker build -t planner:${DOCKER_IMAGE_VERSION} . --no-cache --pull
2939

3040
plugin-images:
31-
docker build -t scaleout:${DOCKER_IMAGE_VERSION} -f plugins/scale_out/Dockerfile .
32-
docker build -t rmpod:${DOCKER_IMAGE_VERSION} -f plugins/rm_pod/Dockerfile .
33-
docker build -t rdt:${DOCKER_IMAGE_VERSION} -f plugins/rdt/Dockerfile .
41+
docker build -t scaleout:${DOCKER_IMAGE_VERSION} -f plugins/scale_out/Dockerfile . --no-cache --pull
42+
docker build -t rmpod:${DOCKER_IMAGE_VERSION} -f plugins/rm_pod/Dockerfile . --no-cache --pull
43+
docker build -t rdt:${DOCKER_IMAGE_VERSION} -f plugins/rdt/Dockerfile . --no-cache --pull
44+
docker build -t cpuscale:${DOCKER_IMAGE_VERSION} -f plugins/cpu_scale/Dockerfile . --no-cache --pull
3445

3546
all-images: controller-images plugin-images
3647

README.md

Lines changed: 13 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -58,14 +58,25 @@ Step 1) add the CRDs:
5858

5959
Step 2) deploy the planner (make sure to adapt the configs to your environment):
6060

61-
$ k apply -f artefacts/deploy/manifest.yaml
61+
$ k create ns ido
62+
$ k apply -n ido -f artefacts/deploy/manifest.yaml
6263

6364
Step 3) deploy the actuators of interest using:
6465

65-
$ k apply -f plugins/<name>/<name>.yaml
66+
$ k apply -n ido -f plugins/<name>/<name>.yaml
6667

6768
These steps should be followed by setting up your default profiles (if needed).
6869

70+
We recommend the usage of a service mesh like [Linkerd](https://linkerd.io/) or [Istio](https://istio.io/) to ensure
71+
encryption and monitoring capabilities for the subcomponents of the planning framework themselves. After creating the
72+
namespace, enable auto-injection; For Linkerd do:
73+
74+
$ k annotate ns ido linkerd.io/inject=enabled
75+
76+
or for Istio use:
77+
78+
$ k label namespace ido istio-injection=enabled --overwrite
79+
6980
For more information on running and configuring the planner see the [getting started](docs/getting_started.md) guide.
7081

7182
## Internals

artefacts/deploy/manifest.yaml

Lines changed: 44 additions & 20 deletions
Original file line numberDiff line numberDiff line change
@@ -102,12 +102,12 @@ rules:
102102
apiVersion: rbac.authorization.k8s.io/v1
103103
kind: ClusterRoleBinding
104104
metadata:
105-
namespace: default
106105
name: planner-role-binding
106+
namespace: ido
107107
subjects:
108108
- kind: ServiceAccount
109-
namespace: default
110109
name: planner-service-account
110+
namespace: ido
111111
roleRef:
112112
kind: ClusterRole
113113
name: planner-role
@@ -122,28 +122,48 @@ metadata:
122122
spec:
123123
containers:
124124
- name: mongodb
125-
image: mongo
125+
image: mongo:6
126126
ports:
127127
- containerPort: 27017
128+
securityContext:
129+
capabilities:
130+
drop: [ 'ALL' ]
131+
seccompProfile:
132+
type: RuntimeDefault
133+
allowPrivilegeEscalation: false
134+
readOnlyRootFilesystem: true
135+
runAsNonRoot: true
136+
runAsUser: 10001
137+
runAsGroup: 10001
138+
volumeMounts:
139+
- name: mongo-tmp
140+
mountPath: /tmp/
141+
- name: data
142+
mountPath: /data/db
128143
resources:
129144
limits:
130145
memory: "4000Mi"
131146
cpu: "2000m"
132147
requests:
133148
memory: "256Mi"
134149
cpu: "500m"
150+
volumes:
151+
- name: mongo-tmp
152+
emptyDir:
153+
- name: data
154+
emptyDir:
135155
tolerations:
136-
- key: node-role.kubernetes.io/master
137-
operator: Exists
138-
- key: node-role.kubernetes.io/control-plane
139-
operator: Exists
156+
- key: node-role.kubernetes.io/master
157+
operator: Exists
158+
- key: node-role.kubernetes.io/control-plane
159+
operator: Exists
140160
affinity:
141161
nodeAffinity:
142162
requiredDuringSchedulingIgnoredDuringExecution:
143163
nodeSelectorTerms:
144-
- matchExpressions:
145-
- key: node-role.kubernetes.io/control-plane
146-
operator: Exists
164+
- matchExpressions:
165+
- key: node-role.kubernetes.io/control-plane
166+
operator: Exists
147167
---
148168
apiVersion: v1
149169
kind: Service
@@ -179,17 +199,21 @@ spec:
179199
serviceAccountName: planner-service-account
180200
containers:
181201
- name: planner
182-
image: 127.0.0.1:5000/planner:0.1.1
202+
image: 127.0.0.1:5000/planner:0.2.0
183203
ports:
184204
- containerPort: 33333
185205
imagePullPolicy: Always
186206
args: [ "-config", "/config/defaults.json", "-v", "2" ]
187207
securityContext:
188208
capabilities:
189-
drop:
190-
- all
209+
drop: [ 'ALL' ]
210+
seccompProfile:
211+
type: RuntimeDefault
212+
allowPrivilegeEscalation: false
213+
readOnlyRootFilesystem: true
191214
runAsNonRoot: true
192215
runAsUser: 10001
216+
runAsGroup: 10001
193217
resources:
194218
limits:
195219
memory: "1000Mi"
@@ -207,17 +231,17 @@ spec:
207231
- name: MONGO_URL
208232
value: "mongodb://planner-mongodb-service:27017/"
209233
tolerations:
210-
- key: node-role.kubernetes.io/master
211-
operator: Exists
212-
- key: node-role.kubernetes.io/control-plane
213-
operator: Exists
234+
- key: node-role.kubernetes.io/master
235+
operator: Exists
236+
- key: node-role.kubernetes.io/control-plane
237+
operator: Exists
214238
affinity:
215239
nodeAffinity:
216240
requiredDuringSchedulingIgnoredDuringExecution:
217241
nodeSelectorTerms:
218-
- matchExpressions:
219-
- key: node-role.kubernetes.io/control-plane
220-
operator: Exists
242+
- matchExpressions:
243+
- key: node-role.kubernetes.io/control-plane
244+
operator: Exists
221245
volumes:
222246
- name: planner-config
223247
configMap:

artefacts/examples/example_deployment.yaml

Lines changed: 10 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -22,6 +22,16 @@ spec:
2222
env:
2323
- name: WORKERS
2424
value: "2"
25+
securityContext:
26+
capabilities:
27+
drop: [ 'ALL' ]
28+
seccompProfile:
29+
type: RuntimeDefault
30+
allowPrivilegeEscalation: false
31+
readOnlyRootFilesystem: true
32+
runAsNonRoot: true
33+
runAsUser: 10001
34+
runAsGroup: 10001
2535
restartPolicy: Always
2636
---
2737
apiVersion: v1

defaults.json

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -17,8 +17,8 @@
1717
"query": "avg(collectd_cpu_percent{exported_instance=~\"%s\"})by(exported_instance)"
1818
},
1919
{
20-
"name": "llc_value",
21-
"query": "avg(rate(collectd_intel_pmu_counter_total{type=\"cache-misses\", exported_instance=~\"%s\"}[30s]))by(exported_instance)"
20+
"name": "ipc_value",
21+
"query": "avg(rate(collectd_intel_pmu_counter_total{type=\"instructions\",exported_instance=~\"%[1]s\"}[30s]))by(exported_instance)/avg(rate(collectd_intel_pmu_counter_total{type=\"cpu-cycles\",exported_instance=~\"%[1]s\"}[30s]))by(exported_instance)"
2222
}
2323
]
2424
},

docs/getting_started.md

Lines changed: 19 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -100,7 +100,7 @@ declared Intents. Over time the planner will learn a scaling model and use that
100100
For more details on models see the [actuators'](actuators.md) documentation.
101101

102102
This intent declaration with the demo assumes a service mesh is used to measure the KPIs. The KPI profiles used match
103-
the default queries described earlier.
103+
the default queries described earlier.
104104

105105
**_Note_** that for this demonstration, it is assumed that proactive and opportunistic planning are enabled. See the
106106
configuration references for more details on this.
@@ -171,7 +171,6 @@ Each actuator will have its own configuration.
171171
| plugin_manager_endpoint | String defining the plugin manager's endpoint to which actuators can register themselves. |
172172
| plugin_manager_port | Port number of the plugin manager's endpoint to which actuators can register themselves. |
173173

174-
175174
### remove pod actuator
176175

177176
| Property | Description |
@@ -184,6 +183,24 @@ Each actuator will have its own configuration.
184183
| plugin_manager_endpoint | String defining the plugin manager's endpoint to which actuators can register themselves. |
185184
| plugin_manager_port | Port number of the plugin manager's endpoint to which actuators can register themselves. |
186185

186+
### cpu scale actuator
187+
188+
| Property | Description |
189+
|------------------------------|------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
190+
| interpreter | Path to a python interpreter. |
191+
| analytics_script | Path to the analytics python script used to determine the scaling model. |
192+
| cpu_max | Maximum CPU resource units (in millis) that the actuator will allow. |
193+
| cpu_rounding | Multiple of 10 defining how to round up CPU resource units. |
194+
| cpu_safeguard_factor | Define the factor the actuator will use to stay below the targeted objective. |
195+
| look_back | Time in minutes defining how old the ML model can be. |
196+
| max_proactive_cpu | Maximum CPU resource units (in millis) that the actuator will allow when proactively scaling. If set to 0, proactive planning is disabled. A fraction of this value is used for proactive scale ups/downs. |
197+
| proactive_latency_percentage | Float defining the potential percentage change in latency by scaling the resources. |
198+
| endpoint | Name of the endpoint to use for registering this plugin. |
199+
| port | Port this actuator should listen on. |
200+
| mongo_endpoint | URI for the Mongo database - representing the knowledge base of the system. |
201+
| plugin_manager_endpoint | String defining the plugin manager's endpoint to which actuators can register themselves. |
202+
| plugin_manager_port | Port number of the plugin manager's endpoint to which actuators can register themselves. |
203+
187204
### RDT actuator
188205

189206
| Property | Description |

0 commit comments

Comments
 (0)