Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
34 changes: 34 additions & 0 deletions doc/telemetry.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,34 @@
# Telemetry in Taskmanager

The TaskManager application is equipped with OpenTelemetry capabilities.
That means that it supports the use of an OpenTelemetry java agent, even though it is not manually instrumented.

## Traces

When using the OpenTelemetry java agent, use of the RabbitMQ library within TaskManager should ensure that spans are created on task pickup/delivery.
Taskmanager will try to keep traces per task intact as much as possible, for instance when switching threads.
Taskmanager does not explicitly defines spans itself currently, it relies on the automatic spans created when using the java agent.

## Metrics

The TaskManager defines a few custom metrics for OpenTelemetry to capture.
These metrics are all defined with the `nl.aerius.TaskManager` instrumentation scope.

| metric name | type | description |
|-----------------------------------------|-----------|----------------------------------------------------------------------|
| `aer.taskmanager.worker_size`<sup>1</sup> | gauge | The number of workers that are configured according to Taskmanager. |
| `aer.taskmanager.current_worker_size`<sup>1</sup> | gauge | The number of workers that are current in Taskmanager. |
| `aer.taskmanager.running_worker_size`<sup>1</sup> | gauge | The number of workers that are occupied in Taskmanager. |
| `aer.taskmanager.running_client_size`<sup>2</sup> | gauge | The number of workers that are occupied for a specific client queue. |
| `aer.taskmanager.dispatched`<sup>1</sup> | histogram | The number of tasks dispatched. |
| `aer.taskmanager.dispatched.wait`<sup>1</sup> | histogram | The average wait time of tasks dispatched. |
| `aer.taskmanager.dispatched.queue`<sup>2</sup> | histogram | The number of tasks dispatched per client queue. |
| `aer.taskmanager.dispatched.queue.wait`<sup>2</sup> | histogram | The average wait time of tasks dispatched per client queue. |
| `aer.taskmanager.work.load`<sup>1</sup> | gauge | Percentage of workers used in the timeframe (1 minute). |

The workers have different attributes to distinguish specific metrics.
* <sup>1</sup> have attribute `worker_type`.
* <sup>2</sup> have attribute `worker_type` and `queue_name`.

`worker_type` is the type of worker, e.g. `ops`.
`queue_name` is the originating queue the task initially was put on, e.g. `...calculator_ui_small`.
Original file line number Diff line number Diff line change
Expand Up @@ -71,7 +71,10 @@ public void incrementOnWorker(final TaskRecord taskRecord) {
}

public int onWorkerTotal(final String queueName) {
return tasksOnWorkersPerQueue.entrySet().stream().filter(e -> keyMapper.queueName(e.getKey()).equals(queueName)).mapToInt(e -> e.getValue().get()).sum();
return tasksOnWorkersPerQueue.entrySet().stream()
.filter(e -> keyMapper.queueName(e.getKey()).equals(queueName))
.mapToInt(e -> e.getValue().get())
.sum();
}

public int onWorker(final TaskRecord taskRecord) {
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -20,8 +20,6 @@
import java.util.Map;
import java.util.function.IntSupplier;

import io.opentelemetry.api.common.AttributeKey;
import io.opentelemetry.api.common.Attributes;
import io.opentelemetry.api.metrics.ObservableDoubleGauge;

import nl.aerius.taskmanager.metrics.OpenTelemetryMetrics;
Expand All @@ -47,7 +45,8 @@ public void addMetric(final IntSupplier countSupplier, final String workerQueueN
metrics.put(clientQueueName, OpenTelemetryMetrics.METER
.gaugeBuilder(METRIC_PREFIX)
.setDescription(DESCRIPTION)
.buildWithCallback(result -> result.record(countSupplier.getAsInt(), workerDefaultAttributes(workerQueueName, clientQueueName))));
.buildWithCallback(
result -> result.record(countSupplier.getAsInt(), OpenTelemetryMetrics.queueAttributes(workerQueueName, clientQueueName))));
}

/**
Expand All @@ -60,11 +59,4 @@ public void removeMetric(final String clienQueueName) {
metrics.remove(clienQueueName).close();
}
}

private static Attributes workerDefaultAttributes(final String workerQueueName, final String clientQueueName) {
return Attributes.builder()
.put(AttributeKey.stringKey("worker_type"), workerQueueName)
.put(AttributeKey.stringKey("client_queue_name"), clientQueueName)
.build();
}
}
26 changes: 0 additions & 26 deletions source/taskmanager/telemetry.md

This file was deleted.