Skip to content

Commit 018e419

Browse files
Copilotalvarolopez
andcommitted
Update documentation and remove requests dependency
- Removed requests dependency from pyproject.toml (no longer needed) - Updated prometheus-extractor.rst with new configuration parameters: - Documented prometheus_metric_name, prometheus_label_type_instance - Documented prometheus_step_seconds, prometheus_query_range, prometheus_verify_ssl - Removed old prometheus_query and prometheus_timeout references - Added detailed explanation of energy calculation formula - Updated examples for Scaphandre and custom metrics - Updated troubleshooting section - Updated configuration.rst with new prometheus section: - Documented all new configuration options - Added reference to prometheus-extractor.rst - Explained energy calculation from microwatt samples - Updated etc/caso/caso.conf.sample with new prometheus configuration options - Updated poetry.lock to reflect dependency changes - All tests pass (6/6 energy-related tests) Co-authored-by: alvarolopez <468751+alvarolopez@users.noreply.github.com>
1 parent 6ac093e commit 018e419

File tree

5 files changed

+122
-523
lines changed

5 files changed

+122
-523
lines changed

doc/source/configuration.rst

Lines changed: 17 additions & 9 deletions
Original file line numberDiff line numberDiff line change
@@ -212,20 +212,28 @@ messenger. Available options:
212212
------------------------
213213

214214
Options defined here configure the Prometheus extractor for gathering energy
215-
consumption metrics. This extractor queries a Prometheus instance to retrieve
216-
energy usage data. Available options:
215+
consumption metrics. This extractor uses the ``prometheus-api-client`` library
216+
to query a Prometheus instance and calculate energy consumption from
217+
instantaneous power samples. Available options:
217218

218219
* ``prometheus_endpoint`` (default: ``http://localhost:9090``), Prometheus
219220
server endpoint URL.
220-
* ``prometheus_query`` (default:
221-
``sum(rate(node_energy_joules_total[5m])) * 300 / 3600000``), PromQL query
222-
to retrieve energy consumption in kWh. This query should return energy
223-
consumption metrics that will be converted to accounting records.
224-
* ``prometheus_timeout`` (default: ``30``), Timeout for Prometheus API
225-
requests in seconds.
221+
* ``prometheus_metric_name`` (default: ``prometheus_value``), Name of the
222+
Prometheus metric to query for energy consumption data.
223+
* ``prometheus_label_type_instance`` (default: ``scaph_process_power_microwatts``),
224+
Value for the ``type_instance`` label used to filter metrics in Prometheus.
225+
* ``prometheus_step_seconds`` (default: ``30``), Frequency between samples in
226+
the time series, in seconds. This is used to calculate energy from power samples.
227+
* ``prometheus_query_range`` (default: ``1h``), Time range for the Prometheus
228+
query (e.g., ``1h``, ``6h``, ``24h``).
229+
* ``prometheus_verify_ssl`` (default: ``true``), Whether to verify SSL
230+
certificates when connecting to Prometheus.
231+
232+
The extractor calculates energy in Watt-hours (Wh) from microwatt power samples
233+
using the formula: ``sum_over_time(metric{labels}[range]) * (step_seconds/3600) / 1000000``.
226234

227235
To use the Prometheus extractor, add ``prometheus`` to the ``extractor`` option
228-
in the main configuration.
236+
in the main configuration. For more details, see :doc:`prometheus-extractor`.
229237

230238
Other cASO configuration options
231239
--------------------------------

doc/source/prometheus-extractor.rst

Lines changed: 93 additions & 46 deletions
Original file line numberDiff line numberDiff line change
@@ -6,6 +6,8 @@ This document provides information on using the Prometheus extractor to gather e
66

77
The Prometheus extractor queries a Prometheus instance to retrieve energy consumption metrics for each VM in the configured projects and generates `EnergyRecord` objects that can be published through cASO's messenger system.
88

9+
The extractor uses the `prometheus-api-client` library to connect to Prometheus and calculate energy consumption in Watt-hours (Wh) from instantaneous power samples stored in Prometheus.
10+
911
## Configuration
1012

1113
To use the Prometheus extractor, add the following configuration to your `caso.conf` file:
@@ -19,63 +21,90 @@ extractor = nova,cinder,prometheus
1921
# Prometheus server endpoint URL
2022
prometheus_endpoint = http://localhost:9090
2123
22-
# PromQL query to retrieve energy consumption in kWh
23-
# Use {{uuid}} as a template variable for the VM UUID
24-
prometheus_query = sum(rate(libvirt_domain_info_energy_consumption_joules_total{uuid=~"{{uuid}}"}[5m])) * 300 / 3600000
24+
# Name of the Prometheus metric to query
25+
prometheus_metric_name = prometheus_value
26+
27+
# Value for the type_instance label
28+
prometheus_label_type_instance = scaph_process_power_microwatts
29+
30+
# Frequency between samples in seconds
31+
prometheus_step_seconds = 30
2532
26-
# Timeout for Prometheus API requests (in seconds)
27-
prometheus_timeout = 30
33+
# Query time range (e.g., '1h', '6h', '24h')
34+
prometheus_query_range = 1h
35+
36+
# Whether to verify SSL when connecting to Prometheus
37+
prometheus_verify_ssl = true
2838
```
2939

3040
## How It Works
3141

3242
The Prometheus extractor:
3343

3444
1. **Scans VMs**: Retrieves the list of VMs from Nova for each configured project
35-
2. **Queries Per VM**: For each VM, executes a customizable Prometheus query
36-
3. **Template Variables**: Replaces `{{uuid}}` in the query with the actual VM UUID
37-
4. **Creates Records**: Generates an `EnergyRecord` for each VM with energy consumption data
45+
2. **Queries Per VM**: For each VM, executes a Prometheus query using the configured metric name and labels
46+
3. **Calculates Energy**: Uses the formula `sum_over_time(metric_name{type_instance="value", uuid="vm-uuid"}[query_range]) * (step_seconds/3600) / 1000000` to convert microwatt power samples to Watt-hours
47+
4. **Creates Records**: Generates an `EnergyRecord` for each VM with energy consumption data and execution metrics
3848

39-
## Customizing the PromQL Query
49+
## Configuration Parameters
4050

41-
The query template can use `{{uuid}}` as a placeholder for the VM UUID. The default query assumes you have energy consumption metrics labeled with the VM UUID.
51+
- **prometheus_endpoint**: URL of the Prometheus server (default: `http://localhost:9090`)
52+
- **prometheus_metric_name**: Name of the metric to query (default: `prometheus_value`)
53+
- **prometheus_label_type_instance**: Value for the `type_instance` label used to filter metrics (default: `scaph_process_power_microwatts`)
54+
- **prometheus_step_seconds**: Frequency between samples in the time series, in seconds (default: `30`)
55+
- **prometheus_query_range**: Time range for the query (default: `1h`). Examples: `1h`, `6h`, `24h`
56+
- **prometheus_verify_ssl**: Whether to verify SSL certificates when connecting to Prometheus (default: `true`)
4257

43-
### Example Queries
58+
## Example Configurations
4459

45-
**For libvirt domain energy metrics:**
46-
```promql
47-
sum(rate(libvirt_domain_info_energy_consumption_joules_total{uuid=~"{{uuid}}"}[5m])) * 300 / 3600000
48-
```
60+
### For Scaphandre Energy Metrics
4961

50-
**For per-VM IPMI power metrics:**
51-
```promql
52-
avg_over_time(ipmi_power_watts{instance=~".*{{uuid}}.*"}[5m]) * 5 * 60 / 1000 / 3600
53-
```
62+
Scaphandre exports energy metrics in microwatts:
5463

55-
**For VM-specific RAPL energy metrics:**
56-
```promql
57-
sum(rate(node_rapl_package_joules_total{vm_uuid="{{uuid}}"}[5m])) * 300 / 3600000
64+
```ini
65+
[prometheus]
66+
prometheus_endpoint = http://prometheus.example.com:9090
67+
prometheus_metric_name = prometheus_value
68+
prometheus_label_type_instance = scaph_process_power_microwatts
69+
prometheus_step_seconds = 30
70+
prometheus_query_range = 6h
71+
prometheus_verify_ssl = false
5872
```
5973

60-
**For Scaphandre per-process metrics:**
61-
```promql
62-
sum(rate(scaph_process_power_consumption_microwatts{exe=~".*qemu.*",cmdline=~".*{{uuid}}.*"}[5m])) * 300 / 1000000 / 3600
74+
### For Custom Energy Metrics
75+
76+
If you have custom energy metrics with different labels:
77+
78+
```ini
79+
[prometheus]
80+
prometheus_endpoint = http://prometheus.example.com:9090
81+
prometheus_metric_name = my_custom_power_metric
82+
prometheus_label_type_instance = my_power_label_value
83+
prometheus_step_seconds = 60
84+
prometheus_query_range = 1h
85+
prometheus_verify_ssl = true
6386
```
6487

6588
## Energy Record Format
6689

6790
The Prometheus extractor generates `EnergyRecord` objects with the following fields:
6891

69-
- `uuid`: Unique identifier for the record
70-
- `measurement_time`: Timestamp when the measurement was taken
71-
- `site_name`: Name of the site (from configuration)
72-
- `user_id`: User identifier (optional for energy records)
73-
- `group_id`: Project/group identifier
74-
- `user_dn`: User Distinguished Name (optional)
75-
- `fqan`: Fully Qualified Attribute Name (VO mapping)
76-
- `energy_consumption`: Energy consumption value (in kWh)
77-
- `energy_unit`: Unit of measurement (default: "kWh")
78-
- `compute_service`: Service name (from configuration)
92+
- `ExecUnitID`: VM UUID
93+
- `StartExecTime`: Start time of the measurement period (ISO 8601 format)
94+
- `EndExecTime`: End time of the measurement period (ISO 8601 format)
95+
- `EnergyWh`: Energy consumption in Watt-hours
96+
- `Work`: CPU hours (CPU duration in hours)
97+
- `Efficiency`: Efficiency factor (placeholder value)
98+
- `WallClockTime_s`: Wall clock time in seconds
99+
- `CpuDuration_s`: CPU duration in seconds (wall time × vCPUs)
100+
- `SuspendDuration_s`: Suspend duration in seconds
101+
- `CPUNormalizationFactor`: CPU normalization factor
102+
- `ExecUnitFinished`: 0 if running, 1 if stopped
103+
- `Status`: VM status (active, stopped, etc.)
104+
- `Owner`: VO/project owner
105+
- `SiteName`: Site name (from configuration)
106+
- `CloudComputeService`: Service name (from configuration)
107+
- `CloudType`: Cloud type (e.g., "openstack")
79108

80109
## Integration with Messengers
81110

@@ -93,13 +122,13 @@ The records will be serialized as JSON with field mapping according to the accou
93122
To test your Prometheus extractor configuration:
94123

95124
1. Verify Prometheus is accessible from cASO
96-
2. Test your PromQL query directly in Prometheus UI with a sample UUID
125+
2. Test your metric exists in Prometheus UI with a sample VM UUID
97126
3. Run cASO with the `--dry-run` option to preview records without publishing
98127
4. Check the logs for any errors or warnings
99128

100129
## Example
101130

102-
Here's a complete example configuration:
131+
Here's a complete example configuration for Scaphandre metrics:
103132

104133
```ini
105134
[DEFAULT]
@@ -110,8 +139,11 @@ messengers = ssm
110139
111140
[prometheus]
112141
prometheus_endpoint = http://prometheus.example.com:9090
113-
prometheus_query = sum(rate(libvirt_domain_info_energy_consumption_joules_total{uuid=~"{{uuid}}"}[5m])) * 300 / 3600000
114-
prometheus_timeout = 30
142+
prometheus_metric_name = prometheus_value
143+
prometheus_label_type_instance = scaph_process_power_microwatts
144+
prometheus_step_seconds = 30
145+
prometheus_query_range = 6h
146+
prometheus_verify_ssl = false
115147
116148
[ssm]
117149
output_path = /var/spool/apel/outgoing/openstack
@@ -121,22 +153,37 @@ output_path = /var/spool/apel/outgoing/openstack
121153

122154
**No records extracted:**
123155
- Verify Prometheus is accessible
124-
- Check that your query returns results in Prometheus UI (replace {{uuid}} with an actual VM UUID)
125-
- Ensure the time range (extract_from/extract_to) covers periods with data
156+
- Check that your metric exists in Prometheus UI
157+
- Ensure the metric has data for the configured time range
126158
- Verify VMs exist in the configured projects
159+
- Check that the metric has the required labels (`type_instance` and `uuid`)
127160

128161
**Connection timeout:**
129-
- Increase `prometheus_timeout` value
130162
- Check network connectivity to Prometheus
131163
- Verify Prometheus is not overloaded
164+
- If using SSL, ensure certificates are valid or set `prometheus_verify_ssl = false`
132165

133166
**Invalid query results:**
134-
- Ensure your query returns numeric values
135-
- Check the query format matches PromQL syntax
136-
- Verify the metrics exist in your Prometheus instance for the VMs
137-
- Test the query with a real VM UUID in Prometheus UI
167+
- Ensure your metric contains instantaneous power values in microwatts
168+
- Check that the metric has the `uuid` label matching VM UUIDs
169+
- Verify the `type_instance` label matches your configuration
170+
- Test the query in Prometheus UI: `sum_over_time(prometheus_value{type_instance="scaph_process_power_microwatts", uuid="<vm-uuid>"}[1h])`
138171

139172
**No VMs found:**
140173
- Verify the projects are correctly configured in cASO
141174
- Check that VMs exist in the OpenStack environment
142175
- Ensure cASO has proper credentials to query Nova
176+
177+
## Technical Details
178+
179+
The energy calculation uses the following formula:
180+
181+
```
182+
energy_wh = sum_over_time(metric{labels}[range]) * (step_seconds / 3600) / 1000000
183+
```
184+
185+
Where:
186+
- `step_seconds / 3600` converts µW·s to µWh
187+
- Division by `1000000` converts µWh to Wh
188+
189+
This approach works with metrics that export instantaneous power consumption in microwatts, sampled at the configured frequency.

0 commit comments

Comments
 (0)