Create a new ElasticsearchDatastreamWriter to more efficiently store data in elasticsearch. #10577

w1ll-i-code · 2025-10-07T13:05:13Z

PR for #10576. For right now, this will be hidden behind a build-flag, until the feature is sufficiently tested and ready for production use. For all the changes made to the original elasticsearch writer, please consult the individual commits.

Copy the elasticsearchwriter to be adapted for the new datastreamwriter. Setup the build system and hide the feature behind a build flag for now.

Restructure the data sent to elasticsearch to align with the Elastic Common Schema specification and separate document indices by check to reduce the number of distinct fields per index.

Handle the errors returned by elasticsearch gracefully to let the user know what went wrong during execution. Do not discard data as soon as a request to elasticsearch fails, but keep retrying until the data can be sent, relying on the WorkQueue for back pressure. Improve the handling of the flush timer, by rescheduling the timer after each flush, making sure that there are no needless flushes under heavy load.

Re-add support for tags, but make them conform to the ecs specification. Add also support for labels, which are the former tags in the elasticsearchwriter. ref: https://www.elastic.co/docs/reference/ecs/ecs-base

Allow the user to filter for which data should be sent to elasticsearch. This allows the user to use multiple datastreams to send the data to different namespaces, for multi-tenancy of different retention policies.

Accept also an ApiKey for authentication in elasticsearch in addition to username + password and certificates.

Icinga2 should manage its own index template if possible. This allows us to ship potential additions to the document later without requiring further user input. However the user has the option to disable the feature, should they want to manage the template manually. For user customization, we ship the `icinga2@custom` component template, so that users can change the behaviour of the template without having to edit the managed one, making updates easier.

Add a template config with all possible config options for the user as well as a short description on what the parameter does and how it can be used. This allows a user to quickly configure the writer without having to look up lots of documentation online.

Update the documentation to give a clearer overview of the new elasticsearchdatastreamwriter feature and add the object and its fields to the syntax highlighting of nano and vim.

Drop messages in the data buffer if the connection to elasticsearch fails. This guarantees that the icinga2 process can still shut down, even with a missconfigured writer or if elasticsearch is down or not reachable without stalling.

Allow the 'datasteam_namespace' variable to contain macros and expand them properly. This allows data to be written into different datastreams based on object properties, e.g. by zone or custom var.

mcodato

I'm not sure about putting this feature behind a build-flag that is OFF by default.
IMHO, having it compiled by default, when the PR is merged, could make adoption easier and faster. The option to enable or disable the feature already exists.
WDYT @lippserd ?

doc/14-features.md

jschmidt-icinga

I just gave this a quick look, so this is not a complete review.

Firstly, I agree with @mcodato that the build flag is unnecessary. It can just be on by default same as all the other perfdata writers.

Secondly, I'd suggest you squash down your commits because they're all operating on the same new component added by this PR. Also there's some whitespace back-and-forth cluttering up the diff.

See below for a some additional comments on the code, mostly things the linter complained about:

lib/perfdata/elasticsearchdatastreamwriter.hpp

lib/perfdata/elasticsearchdatastreamwriter.cpp

lib/perfdata/elasticsearchdatastreamwriter.hpp

lib/perfdata/elasticsearchdatastreamwriter.cpp

w1ll-i-code · 2025-10-14T15:03:09Z

#10577 (review) Hi, thanks for having a look at it, I really appreciate it.

I'd suggest you squash down your commits [...]

I left the commits separate deliberately, so it's easier to review. Each commit can be looked at on their own, without having to reason about all changes at once.

jschmidt-icinga · 2025-10-22T12:03:08Z

Thank you for this PR. I've briefly spoken to @lippserd about this and I'm going to take another look when I can. I know next to nothing about Elasticsearch yet, so it might take some until I can test this thoroughly.

martialblog · 2025-10-23T12:10:20Z

Hi,

Just a hint, I recently addressed other issue with the ElasticsearchWriter format here #10511

I provided a small but breaking change here: #10518

If the ElasticsearchDatastreamWriter is introduced, both Writers should use the same format, right?

@jschmidt-icinga let me know if you need help with Elasticsearch knowhow, you know where my office is.

martialblog · 2025-10-23T12:25:54Z

FYI, I think this would also (somewhat) addresses this issue #9837

With datastreams the backing indices are managed by Elasticsearch.

martialblog · 2025-10-23T13:42:46Z

lib/perfdata/elasticsearchdatastreamwriter.cpp

+		if (!pdv->GetCrit().IsEmpty() && GetEnableSendThresholds())
+			metric->Set("crit", pdv->GetCrit());
+
+		pd_fields->Set(pdv->GetLabel(), metric);


Hi,

Am I reading this correct?
This would result in a schema where the perfdata label is in the field name?

Like so:

"_source": { "timestamp": "1761225482", "perfdata.rta": { "value": 0.091, "warn": 100, "min": 0, "crit": 200, "unit": "ms" }, "perfdata.pl": { "value": 0, "warn": 5, "min": 0, "crit": 15, "unit": "%" } }

I think this might cause issues in the longterm, as described here: #6805 and #10511

Since each new field will cause a new mapping in the index. Correct me if I'm wrong.

The issue with adding new fields for every label is this: https://www.elastic.co/docs/troubleshoot/elasticsearch/mapping-explosion

In the change to the Elasticwriter I proposed, I used a field called "label" in a list of objects.

Like so:

"_source": { "timestamp": "1761225482", "perfdata": [ { "value": 0.091, "label": "rta" "warn": 100, "min": 0, "crit": 200, "unit": "ms" }, { "value": 0, "label": "pl" "warn": 5, "min": 0, "crit": 15, "unit": "%" } ] }

This would just create a single field. And there is an expected field name where to find the label.

Otherwise you need to fetch the entire document, scan for all of the fields that start with "perfdata." to find the perfdata fields... and would still need to split the key name at the . to get to the label, right?

This field can then the either object or nested:

https://www.elastic.co/docs/reference/elasticsearch/mapping-reference/object

https://www.elastic.co/docs/reference/elasticsearch/mapping-reference/nested

Just to make things a bit clearer. When using an array using nested here might be better, since it maintains the independence of each object.

For example, when searching all perfdata for an object:

curl -X POST "http://localhost:9200/icinga2/_search" -H 'Content-Type: application/json' -d' { "query": { "bool": { "must": [ { "match": { "host": "myhost" }}, { "match": { "service": "ping6" }} ] } }, "fields": [ "check_result.perfdata.*" ], "_source": false } '

An object mapping returns this:

{ "_id": "AZoRQYZWWEqPxFmVW73R", "fields": { "check_result.perfdata.unit": [ "ms", "%" ], "check_result.perfdata.label": [ "rta", "pl" ], "check_result.perfdata.warn": [ 100, 5 ], "check_result.perfdata.min": [ 0, 0 ], "check_result.perfdata.value": [ 0, 0 ], "check_result.perfdata.crit": [ 200, 15 ] } }

And a nested mapping returns:

{ "_id": "AZoRUkSmucmeEsXVMUwm", "fields": { "check_result.perfdata": [ { "warn": [ 100 ], "unit": [ "ms" ], "min": [ 0 ], "crit": [200 ], "label": [ "rta" ], "value": [ 0 ] }, { "warn": [ 5 ], "unit": [ "%" ], "min": [ 0 ], "crit": [ 15 ], "label": ["pl" ], "value": [ 0 ] } ] }

I think I know the puzzle piece you are missing:
We are indexing the documents based on the check, similar to what elasticsearch already does (e.g. beats might store data in metrics-system.diskio-master).

If you read the config file, you might realize that we are doing something similar with metrics-icinga2.<check>-<namespace>. Since the same check should always produce the same output, we won't have an explosion of fields.

Also, neither object nor nested work with timeseries datastreams. (If you know any better, feel free to correct me on that.)

Ah ok I see. Thanks for the clarification. So each check would get its own datastream and thus indices. 👍 I hadn't had the time to compile and test the new writer yet, just skimmed over the code.

Are there any things we need to consider when having a datastream per check? There also is a limit on the number of shards per node. https://www.elastic.co/docs/reference/elasticsearch/configuration-reference/miscellaneous-cluster-settings

Might not be an issue, just wanted to mention it. And users can always add more data nodes to scale up.

Finally had some time to test it. Looks good.

curl localhost:9200/_cat/indices yellow open .ds-metrics-icinga2.swap-default-2025.10.30-000001 CBEhsKj5QweXa12A4PFafQ 1 1 17 0 224.4kb 224.4kb 224.4kb yellow open .ds-metrics-icinga2.ping4-default-2025.10.30-000001 fe8buP9QROOWv5xt6qxI5A 1 1 16 0 221.3kb 221.3kb 221.3kb yellow open .ds-metrics-icinga2.ping6-default-2025.10.30-000001 kqhy-33dRl2CtNw2tWe7MQ 1 1 15 0 214.6kb 214.6kb 214.6kb yellow open .ds-metrics-icinga2.ssh-default-2025.10.30-000001 LKgw--JYSF66eAEZNhZFIQ 1 1 15 0 213.8kb 213.8kb 213.8kb

I still wonder how the amount of incides/shards might affect larger setups.

But the search does what it should do:

curl -X POST "localhost:9200/metrics-icinga2.load-default/_search" -H 'Content-Type: application/json' -d' { "query": { "bool": { "must": [ { "match": { "service.name": "3ecab54cb896!load" }} ] } }, "fields": [ "perfdata.*" ], "_source": false }

It's still a bit tricky to fetch the labels for each metric (load1, load5, etc), since it's in a key, but it's an improvement over the current ElasticsearchWriter. Example result:

{ "_index": ".ds-metrics-icinga2.load-default-2025.10.30-000001", "_id": "Cj19dEiEhn-MNA9xAAABmjWiFKQ", "_score": 0.03390155, "fields": { "perfdata.load15.min": [ 0.0 ], "perfdata.load1.min": [ 0.0 ], "perfdata.load5.value": [ 2.22 ], "perfdata.load5.min": [ 0.0 ], "perfdata.load15.value": [ 2.63 ], "perfdata.load1.value": [ 2.16 ] } }

when I don't query for the fields, it's a bit simpler:

... "perfdata": { "load1": { "min": 0, "value": 2.3 }, "load15": { "min": 0, "value": 2.2 }, "load5": { "min": 0, "value": 2 } } ...

Overall very nice solution.

martialblog · 2025-10-27T13:48:10Z

usr/elasticsearch/index-template.json

+        "mode": "time_series",
+        "routing_path": "check.checkable",
+        "mapping": {
+          "total_fields.limit": 2000


I don't think we want to do this. The default limit of 1000 is high enough and instead we should avoid writing so many fields.

By that I mean, instead of using the perfdata label in a field name, we should have it in a field "label"

We should not get much over this, but in my tests I found the icingadb check in particular to write a lot of performance data into its datastream metrics-icinga2.icingadb-default. I don't think any other check will go far beyond that.

martialblog · 2025-10-27T14:44:40Z

usr/elasticsearch/index-template.json

+        }
+      }
+    },
+    "mappings": {


I was wondering if we could use a mapping closer to the current ElasticsearchWriter?

It should contain everything we need for a timeseries. Instead of a new "checkable" fields, we could add "time_series_dimension": true to both the host and service fields, probably also the checkcommand.

For example:

{ "template": { "mappings": { "properties": { "@timestamp": { "type": "date" }, "check_command": { "type": "keyword", "time_series_dimension": true }, "check_result": { "properties": { "check_source": { "type": "text", "fields": { "keyword": { "type": "keyword", "ignore_above": 256 } } }, "command": { "type": "text", "fields": { "keyword": { "type": "keyword", "ignore_above": 256 } } }, "execution_end": { "type": "date" }, "execution_start": { "type": "date" }, "execution_time": { "type": "float" }, "exit_status": { "type": "long" }, "latency": { "type": "float" }, "output": { "type": "text", "fields": { "keyword": { "type": "keyword", "ignore_above": 256 } } }, "perfdata": { "type": "nested", "properties": { "crit": { "type": "long" }, "label": { "type": "text", "fields": { "keyword": { "type": "keyword", "ignore_above": 256 } } }, "max": { "type": "long" }, "min": { "type": "long" }, "unit": { "type": "text", "fields": { "keyword": { "type": "keyword", "ignore_above": 256 } } }, "value": { "type": "long" }, "warn": { "type": "long" } } }, "schedule_end": { "type": "date" }, "schedule_start": { "type": "date" }, "state": { "type": "long" }, "vars_after": { "properties": { "attempt": { "type": "long" }, "reachable": { "type": "boolean" }, "state": { "type": "long" }, "state_type": { "type": "long" } } }, "vars_before": { "properties": { "attempt": { "type": "long" }, "reachable": { "type": "boolean" }, "state": { "type": "long" }, "state_type": { "type": "long" } } } } }, "current_check_attempt": { "type": "long" }, "host": { "type": "keyword", "time_series_dimension": true }, "last_hard_state": { "type": "long" }, "last_state": { "type": "long" }, "max_check_attempts": { "type": "long" }, "reachable": { "type": "boolean" }, "service": { "type": "keyword", "time_series_dimension": true }, "state": { "type": "long" }, "state_type": { "type": "long" }, "timestamp": { "type": "date" }, "type": { "type": "text", "fields": { "keyword": { "type": "keyword", "ignore_above": 256 } } } } } } }

This could be a _component_template that is used for both Writers

You are right on the checkable. We can make the host and service name timeseries dimensions. But since we are already writing checks into their own datastreams, I don't think the check_command needs to be a dimension as well.

jschmidt-icinga

I've also commented a few more nitpicks, but for the moment I want to primarily focus on two areas:

Finalize a format/index-template that can also be used by the basic ElasticsearchWriter in the future. (Please see @martialblog's detailed comments in addition to mine)
Minimizing the impact this has on ProcessCheckResult execution times by moving most stuff into the work queue.

I'm open to suggestions, especially regarding the data format.

jschmidt-icinga · 2025-10-27T09:24:00Z

etc/icinga2/features-available/elasticsearchdatastream.conf

+    /* Enable sending the threashold values as additional fields
+    *  with the service check metrics. If set to true, it will
+    *  send warn and crit for every performance data item.
+    */


Suggested change

/* Enable sending the threashold values as additional fields

* with the service check metrics. If set to true, it will

* send warn and crit for every performance data item.

*/

/* Enable sending the threshold values as additional fields

* with the service check metrics. If set to true, it will

* send warn and crit for every performance data item.

*/

jschmidt-icinga · 2025-10-27T09:58:10Z

lib/perfdata/elasticsearchdatastreamwriter.cpp

+		} catch (const StatusCodeException& es) {
+		  int status_code = es.GetStatusCode();
+		  if (status_code == static_cast<int>(http::status::bad_request)) {


I still don't like this.
the underlying type of http::status is unsigned, so you should at least use that in StatusCodeException instead of int. However even better would be to move the Exception class to the cpp file and directly store the enum-type.

Suggested change

} catch (const StatusCodeException& es) {

int status_code = es.GetStatusCode();

if (status_code == static_cast<int>(http::status::bad_request)) {

} catch (const StatusCodeException& es) {

if (es.GetStatusCode() == http::status::bad_request) {

jschmidt-icinga · 2025-10-27T10:02:01Z

lib/perfdata/elasticsearchdatastreamwriter.hpp

+	StatusCodeException(int status_code, String message, String body)
+		: std::runtime_error(std::move(message)), m_StatusCode(status_code), m_Body(std::move(body)) {}


I think the status_code should be part of the message passed to std:.runtime_error, since DiagnosticInformation() will only print that and you're not catching the StatusCodeException directly in every instance, so that information would get lost otherwise.

Thanks, I did not know that. I'll make sure to add more information to the error message.

jschmidt-icinga · 2025-10-27T13:25:05Z

lib/perfdata/CMakeLists.txt

  ${ICINGA2_CONFIGDIR}/features-available
 )

+


Superfluous newline

jschmidt-icinga · 2025-10-27T13:26:34Z

lib/perfdata/elasticsearchdatastreamwriter.cpp

+namespace http = beast::http;
+
+static void ExceptionHandler(const boost::exception_ptr& exp);
+static Array::Ptr ExtractTemplateTags(const MacroProcessor::ResolverList& resolvers, const Array::Ptr& tags_tmpl,


tags_tmpl vs. tag_tmpl in the definition.

jschmidt-icinga · 2025-10-27T14:40:13Z

lib/perfdata/elasticsearchdatastreamwriter.cpp

+	return Convert::ToBool(m_CompiledFilter->Evaluate(frame));
+}
+
+void ElasticsearchDatastreamWriter::CheckResultHandler(const Checkable::Ptr& checkable, const CheckResult::Ptr& cr)


Since this function is executed synchronously during ProcessCheckResult, which is a bottleneck as it is, this should only do as much as it really needs to, then enqueue the rest of the work in the work queue, similarly to ElasticsearchWriter since #10420.

Essentially, just the state-information from the checkable needs to be gathered beforehand, so basically set those fields of ecs_host and ecs_service here, then pass it into the work queue, fill out the rest and build the document there.

Thanks, I actually never ran into the issue where ProcessCheckResult was the bottleneck and wanted to optimize for write speed to elasticsearch instead by having the values pre-computed. I'll restructure it as you suggested.

ProcessCheckResult is probably the busiest crossing in the entire program, depending on what features are enabled and how many checks are performed at which intervals. While it may look like it's blocking per-checkable, it's essentially blocking per-endpoint connection. We have been talking about making it process check results in parallel, at least per-checkable, but I don't think anybody is currently working on that.

I don't think write speed to elasticsearch will be impacted by moving these things onto the queue, since you have to go through the queue once anyway (currently just for sending) and at that point it doesn't matter on which side of the of the work queue the data is put together.

jschmidt-icinga · 2025-10-27T15:02:41Z

lib/perfdata/elasticsearchdatastreamwriter.cpp

+	Dictionary::Ptr ecs_host = new Dictionary({
+		{"name", host->GetDisplayName()},
+		{"hostname", host->GetName()},
+		{"soft_state", host->GetState()},
+		{"hard_state", host->GetLastHardState()}
+	});
+	if (!host->GetZoneName().IsEmpty()) ecs_host->Set("zone", host->GetZoneName());
+
+	char _addr[16];
+	if (!host->GetAddress().IsEmpty() && inet_pton(AF_INET, host->GetAddress().CStr(), _addr) == 1) {
+		ecs_host->Set("ip", host->GetAddress());
+	} else if (!host->GetAddress6().IsEmpty() && inet_pton(AF_INET6, host->GetAddress6().CStr(), _addr) == 1) {
+		ecs_host->Set("ip", host->GetAddress6());
+	} else if (!host->GetAddress().IsEmpty()) {
+		ecs_host->Set("fqdn", host->GetAddress());
+	}
+
+	Dictionary::Ptr ecs_service;
+	if (service) {
+	   	ecs_service = new Dictionary({
+	   		{"name", service->GetName()},
+			{"display_name", service->GetDisplayName()},
+			{"soft_state", service->GetState()},
+			{"hard_state", service->GetLastHardState()}
+		});
+		if (!service->GetZoneName().IsEmpty()) ecs_service->Set("zone", service->GetZoneName());
+	}


ecs_service and ecs_host are identical in structure and in case this check result is for a service, the host's state isn't really relevant anyway. On the other hand, we're missing information about reachability and the current/max attempts. This also seems to make it harder on the consumer side to evaluate, given that they would need to check if service exists and then either work with the data from the "host" or the "service" subobject.

Unless there is a good reason for doing it this way, I'd suggest sticking with the way it's done in the old ElasticsearchWriter, have a generic checkable object ecs_checkable with a host field and an optional service field.

I split them like this, because the objects host and service have a special meaning in the elasticsearch common schema.

Ok, then it makes perfect sense filling them out this way. I'm still very new to elasticsearch and will have to look at the ECS. Up to now I was operating under the assumption that the structure of the data sent to elasticsearch was entirely arbitrary.

Could you please add a link to the relevant documentation for these fields in the code where they are filled out? I think it would be helpful as a reminder and for other people reading the code that are as unfamiliar with es as myself. 😄

jschmidt-icinga · 2025-10-27T15:04:14Z

lib/perfdata/elasticsearchdatastreamwriter.cpp

+
+	String checkable_name = service == nullptr ? host->GetName() : service->GetName();
+	Dictionary::Ptr check_result = new Dictionary({
+		{"checkable", checkable_name},


This seems to be redundant, given that we already have this information in the host/service/checkable object.

yes. @martialblog has mentioned that as well. we can remove that.

General improvements to code and documentation. Fixing comments by: - Mattia Codato - Johannes Schmidt

w1ll-i-code · 2025-10-29T08:42:27Z

Thanks @martialblog for the insight. I'd encourage you to read the commit messages, as I tried to lay out my reasoning for some decisions there. Also, take a look at the example config file, as I think it clears up some of your concerns.

jschmidt-icinga

I took a look at the filter option today and I think that's a very good idea to add something like this to a perfdata writer. We should definitely update the other writers with something similar when we get the chance.

jschmidt-icinga · 2025-10-29T11:17:41Z

lib/perfdata/elasticsearchdatastreamwriter.cpp

+		BOOST_THROW_EXCEPTION(ValidationError(this, { "filter" }, "Filter must be an expression."));
+	}
+
+


Superfluous newline

jschmidt-icinga · 2025-10-29T11:18:34Z

lib/perfdata/elasticsearchdatastreamwriter.cpp

+	std::unique_ptr<Expression>(MakeLiteral(filter)),
+	std::unique_ptr<Expression>(MakeLiteral("call"))


This should be indented.

jschmidt-icinga · 2025-10-29T11:20:49Z

lib/perfdata/elasticsearchdatastreamwriter.cpp

+}
+
+void ElasticsearchDatastreamWriter::ValidateFilter(const Lazy<Value> &lvalue, const ValidationUtils &utils) {
+	Value filter = lvalue();


No need to take this by value, make it a const reference.

jschmidt-icinga · 2025-10-29T11:22:44Z

lib/perfdata/elasticsearchdatastreamwriter.cpp

+	FunctionCallExpression *fexpr = new FunctionCallExpression(std::move(indexer), std::move(args));
+
+	m_CompiledFilter = std::move(fexpr);


Just assign this directly. Move doesn't do anything here, since this is a raw pointer.

jschmidt-icinga · 2025-10-29T11:25:04Z

etc/icinga2/features-available/elasticsearchdatastream.conf

+    * You can use any attribute of the host, service, checkable or
+    * checkresult (cr) objects in the filter expression.
+    */
+    // filter = {{ "host.name == 'myhost' || service.name == 'myservice'" }}


Why is this quoted inside the lambda? Furthermore, if you remove the outer quotes, the inner single-quotes are not valid DSL syntax.

jschmidt-icinga · 2025-10-29T11:26:35Z

lib/perfdata/elasticsearchdatastreamwriter.cpp

+	Host::Ptr host;
+	Service::Ptr service;
+	tie(host, service) = GetHostService(checkable);


You can use structured bindings, like so:

Suggested change

Host::Ptr host;

Service::Ptr service;

tie(host, service) = GetHostService(checkable);

auto [host, service] = GetHostService(checkable);

jschmidt-icinga · 2025-10-29T13:50:36Z

lib/perfdata/elasticsearchdatastreamwriter.cpp

+	frameNS->Set("checkable", checkable);
+	frameNS->Set("check_result", cr);
+	frameNS->Set("host", host);
+	frameNS->Set("service", service);


"checkable" and "check_result" are redundant here. The user would need some logic anyway to know if they're dealing with a host or service, and using this would just reverse that logic. Sure, you could do something like match(checkable.name, {{{test*}}}) to get both the host and all its services, but at the cost of introducing a new convention, since we mostly only bind "host", "service" and "obj" in cases like the API-filters.

And the check result can also be retrieved via (host|service).last_check_result.

jschmidt-icinga · 2025-10-29T13:53:35Z

lib/perfdata/elasticsearchdatastreamwriter.cpp

+	Host::Ptr host;
+	Service::Ptr service;
+	tie(host, service) = GetHostService(checkable);


Also use structured bindings here please.

jschmidt-icinga · 2025-10-29T14:04:54Z

etc/icinga2/features-available/elasticsearchdatastream.conf

+    /* The datastream namespace to use. This can be used to separate different
+    *  Icinga instances. Or for letting multiple Writers write to different
+    *  datastreams in the same Elasticsearch cluster. By using the filter option.
+    *  The Elasticsearch datastream name will be
+    *  "metrics-icinga2.{check}-{datastream_namespace}".
+    */


The block comments in this file are a bit inconsistent. Can you please ensure they're in this format:

/* Foo * bar */

Sorry for nitpicking again 😄

martialblog · 2025-10-30T15:40:12Z

doc/14-features.md

+> **Note**
+>
+> This feature requires Elasticsearch to support timeseries datastreams and have the ecs component template installed.
+> This feature was tested successfully with Elasticsearch 8.12 and 9.0.8.


Do we need to explicitly mention that the ElasticsearchDatastreamWriter will not work with OpenSearch? Since the current ElasticsearchWriter works with OpenSearch, users might expect the same here.

Did some testing with OpenSearch, luckily there is little that needs to change for this implementation to work with OpenSerch.

The index-template.json needs to be changed, the users can do that themselves

OpenSearch responds with charset=UTF-8 not charset=utf-8

0001-Opensearch.patch

@w1ll-i-code Can we loosen the charset requirement? Modern JSON should use UTF-8 anyways. https://www.rfc-editor.org/rfc/rfc8259

JSON text exchanged between systems that are not part of a closed ecosystem MUST be encoded using UTF-8

A "tolower()" might also work, but maybe @jschmidt-icinga has some ideas on how the codebase should handle it.

I can take care of the documentation regarding OpenSearch once this PR is merged.

jschmidt-icinga · 2025-10-31T08:49:52Z

lib/perfdata/elasticsearchdatastreamwriter.cpp

+	if (response.result_int() > 299) {
+		Log(LogCritical, "ElasticsearchDatastreamWriter")
+			<< "Unexpected response code " << response.result_int() << " from URL '" << url->Format() << "'. Error: " << response.body();
+		BOOST_THROW_EXCEPTION(StatusCodeException(
+			response.result_int(),
+			"Unexpected response code from Elasticsearch",
+			response.body()
+		));
+	}


You should not unconditionally log this to an error. For example if the template already exists a 400 error is returned without this being an error on our side that the user should be aware of.

So either make this LogDebug here and log errors when appropriate in the catch block outside TrySend (which you already do), or ensure we never expect error status codes unless something actually goes wrong, for example by checking if the template already exists before sending it.

martialblog · 2025-10-31T13:24:35Z

@w1ll-i-code Maybe we need to handle fact that CheckCommand object can have names that are not allowed and index names.

For example:

object CheckCommand "example_FOOBAR_bar-foo" {
  import "ping-common"
  command += [ "-4" ]
  vars.ping_address = "$address$"
}

apply Service "ping1" {
 import "generic-service"
 check_command = "example_FOOBAR_bar-foo"
 assign where host.address
}

object CheckCommand "i like icinga" {
  import "ping-common"
  command += [ "-4" ]
  vars.ping_address = "$address$"
}

apply Service "foobar" {
 import "generic-service"
 check_command = "i like icinga"
 assign where host.address
}

This will result in:

[2025-10-31 13:15:29 +0000] warning/ElasticsearchDatastreamWriter: Error during document creation: illegal_argument_exception: data_stream [metrics-icinga2.example_FOOBAR_bar-foo-default] must be lowercase

[2025-10-31 13:15:51 +0000] warning/ElasticsearchDatastreamWriter: Error during document creation: illegal_argument_exception: data_stream [metrics-icinga2.i like icinga-default] must not contain the following characters ['?','*','<','"',' ','','/',',','|','>']

Not sure yet what a good solution might be. Maybe some form of normalization, maybe just letting the users know they need to be careful with the object names, maybe both.

w1ll-i-code added 11 commits October 7, 2025 14:45

Create base structure for datastreamwriter

12c6baf

Copy the elasticsearchwriter to be adapted for the new datastreamwriter. Setup the build system and hide the feature behind a build flag for now.

Restructure the document for elasticsearch.

c0c0beb

Restructure the data sent to elasticsearch to align with the Elastic Common Schema specification and separate document indices by check to reduce the number of distinct fields per index.

Allow for user defined tags and labels.

d6acf20

Re-add support for tags, but make them conform to the ecs specification. Add also support for labels, which are the former tags in the elasticsearchwriter. ref: https://www.elastic.co/docs/reference/ecs/ecs-base

Add filter option to the config.

2f78ffd

Allow the user to filter for which data should be sent to elasticsearch. This allows the user to use multiple datastreams to send the data to different namespaces, for multi-tenancy of different retention policies.

Add elastic authentication via ApiKey

3f98090

Accept also an ApiKey for authentication in elasticsearch in addition to username + password and certificates.

Update documentation and syntax highlighting

c1b98d2

Update the documentation to give a clearer overview of the new elasticsearchdatastreamwriter feature and add the object and its fields to the syntax highlighting of nano and vim.

Drop documents on shutdown when connection fails

16b6c1b

Drop messages in the data buffer if the connection to elasticsearch fails. This guarantees that the icinga2 process can still shut down, even with a missconfigured writer or if elasticsearch is down or not reachable without stalling.

Allow templates in the datastream namespace.

5d847b1

Allow the 'datasteam_namespace' variable to contain macros and expand them properly. This allows data to be written into different datastreams based on object properties, e.g. by zone or custom var.

cla-bot bot added the cla/signed label Oct 7, 2025

mcodato reviewed Oct 7, 2025

View reviewed changes

doc/14-features.md Outdated Show resolved Hide resolved

doc/14-features.md Outdated Show resolved Hide resolved

jschmidt-icinga reviewed Oct 14, 2025

View reviewed changes

Remove the build flag for the ElasticsearchDatastreamWriter

5dee72e

w1ll-i-code force-pushed the wp_elasticsearchdatastreamwriter branch 6 times, most recently from 4a9699b to 9a83f44 Compare October 21, 2025 15:06

martialblog reviewed Oct 23, 2025

View reviewed changes

martialblog reviewed Oct 27, 2025

View reviewed changes

jschmidt-icinga requested changes Oct 27, 2025

View reviewed changes

w1ll-i-code force-pushed the wp_elasticsearchdatastreamwriter branch from 9a83f44 to 9b202be Compare October 28, 2025 07:40

w1ll-i-code force-pushed the wp_elasticsearchdatastreamwriter branch 2 times, most recently from 4f050e7 to 9e06023 Compare October 28, 2025 07:41

Cleanup code after the first round of comments.

5c270a5

General improvements to code and documentation. Fixing comments by: - Mattia Codato - Johannes Schmidt

w1ll-i-code force-pushed the wp_elasticsearchdatastreamwriter branch from 9e06023 to 5c270a5 Compare October 28, 2025 07:42

jschmidt-icinga reviewed Oct 29, 2025

View reviewed changes

martialblog reviewed Oct 30, 2025

View reviewed changes

jschmidt-icinga reviewed Oct 31, 2025

View reviewed changes

jschmidt-icinga mentioned this pull request Oct 31, 2025

InfluxDB2 Writer (and quite possibly all other Data Outputs) may inhibit core functionality #10159

Open

		StatusCodeException(int status_code, String message, String body)
		: std::runtime_error(std::move(message)), m_StatusCode(status_code), m_Body(std::move(body)) {}

		BOOST_THROW_EXCEPTION(ValidationError(this, { "filter" }, "Filter must be an expression."));
		}

		std::unique_ptr<Expression>(MakeLiteral(filter)),
		std::unique_ptr<Expression>(MakeLiteral("call"))

		FunctionCallExpression *fexpr = new FunctionCallExpression(std::move(indexer), std::move(args));

		m_CompiledFilter = std::move(fexpr);

Create a new ElasticsearchDatastreamWriter to more efficiently store data in elasticsearch. #10577

Are you sure you want to change the base?

Create a new ElasticsearchDatastreamWriter to more efficiently store data in elasticsearch. #10577

Uh oh!

Conversation

w1ll-i-code commented Oct 7, 2025

Uh oh!

mcodato left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

jschmidt-icinga left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

w1ll-i-code commented Oct 14, 2025

Uh oh!

jschmidt-icinga commented Oct 22, 2025

Uh oh!

martialblog commented Oct 23, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

martialblog commented Oct 23, 2025

Uh oh!

martialblog Oct 23, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

martialblog Oct 27, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

jschmidt-icinga left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

martialblog commented Oct 23, 2025 •

edited

Loading

martialblog Oct 23, 2025 •

edited

Loading

martialblog Oct 27, 2025 •

edited

Loading

jschmidt-icinga left a comment •

edited

Loading