-
Notifications
You must be signed in to change notification settings - Fork 597
Create a new ElasticsearchDatastreamWriter to more efficiently store data in elasticsearch. #10577
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Create a new ElasticsearchDatastreamWriter to more efficiently store data in elasticsearch. #10577
Conversation
Copy the elasticsearchwriter to be adapted for the new datastreamwriter. Setup the build system and hide the feature behind a build flag for now.
Restructure the data sent to elasticsearch to align with the Elastic Common Schema specification and separate document indices by check to reduce the number of distinct fields per index.
Handle the errors returned by elasticsearch gracefully to let the user know what went wrong during execution. Do not discard data as soon as a request to elasticsearch fails, but keep retrying until the data can be sent, relying on the WorkQueue for back pressure. Improve the handling of the flush timer, by rescheduling the timer after each flush, making sure that there are no needless flushes under heavy load.
Re-add support for tags, but make them conform to the ecs specification. Add also support for labels, which are the former tags in the elasticsearchwriter. ref: https://www.elastic.co/docs/reference/ecs/ecs-base
Allow the user to filter for which data should be sent to elasticsearch. This allows the user to use multiple datastreams to send the data to different namespaces, for multi-tenancy of different retention policies.
Accept also an ApiKey for authentication in elasticsearch in addition to username + password and certificates.
Icinga2 should manage its own index template if possible. This allows us to ship potential additions to the document later without requiring further user input. However the user has the option to disable the feature, should they want to manage the template manually. For user customization, we ship the `icinga2@custom` component template, so that users can change the behaviour of the template without having to edit the managed one, making updates easier.
Add a template config with all possible config options for the user as well as a short description on what the parameter does and how it can be used. This allows a user to quickly configure the writer without having to look up lots of documentation online.
Update the documentation to give a clearer overview of the new elasticsearchdatastreamwriter feature and add the object and its fields to the syntax highlighting of nano and vim.
Drop messages in the data buffer if the connection to elasticsearch fails. This guarantees that the icinga2 process can still shut down, even with a missconfigured writer or if elasticsearch is down or not reachable without stalling.
Allow the 'datasteam_namespace' variable to contain macros and expand them properly. This allows data to be written into different datastreams based on object properties, e.g. by zone or custom var.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure about putting this feature behind a build-flag that is OFF by default.
IMHO, having it compiled by default, when the PR is merged, could make adoption easier and faster. The option to enable or disable the feature already exists.
WDYT @lippserd ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I just gave this a quick look, so this is not a complete review.
Firstly, I agree with @mcodato that the build flag is unnecessary. It can just be on by default same as all the other perfdata writers.
Secondly, I'd suggest you squash down your commits because they're all operating on the same new component added by this PR. Also there's some whitespace back-and-forth cluttering up the diff.
See below for a some additional comments on the code, mostly things the linter complained about:
|
#10577 (review) Hi, thanks for having a look at it, I really appreciate it.
I left the commits separate deliberately, so it's easier to review. Each commit can be looked at on their own, without having to reason about all changes at once. |
4a9699b to
9a83f44
Compare
|
Thank you for this PR. I've briefly spoken to @lippserd about this and I'm going to take another look when I can. I know next to nothing about Elasticsearch yet, so it might take some until I can test this thoroughly. |
|
Hi, Just a hint, I recently addressed other issue with the ElasticsearchWriter format here #10511 I provided a small but breaking change here: #10518 If the ElasticsearchDatastreamWriter is introduced, both Writers should use the same format, right? @jschmidt-icinga let me know if you need help with Elasticsearch knowhow, you know where my office is. |
|
FYI, I think this would also (somewhat) addresses this issue #9837 With datastreams the backing indices are managed by Elasticsearch. |
| if (!pdv->GetCrit().IsEmpty() && GetEnableSendThresholds()) | ||
| metric->Set("crit", pdv->GetCrit()); | ||
|
|
||
| pd_fields->Set(pdv->GetLabel(), metric); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi,
Am I reading this correct?
This would result in a schema where the perfdata label is in the field name?
Like so:
"_source": {
"timestamp": "1761225482",
"perfdata.rta": {
"value": 0.091,
"warn": 100,
"min": 0,
"crit": 200,
"unit": "ms"
},
"perfdata.pl": {
"value": 0,
"warn": 5,
"min": 0,
"crit": 15,
"unit": "%"
}
}I think this might cause issues in the longterm, as described here: #6805 and #10511
Since each new field will cause a new mapping in the index. Correct me if I'm wrong.
The issue with adding new fields for every label is this: https://www.elastic.co/docs/troubleshoot/elasticsearch/mapping-explosion
In the change to the Elasticwriter I proposed, I used a field called "label" in a list of objects.
Like so:
"_source": {
"timestamp": "1761225482",
"perfdata": [
{
"value": 0.091,
"label": "rta"
"warn": 100,
"min": 0,
"crit": 200,
"unit": "ms"
},
{
"value": 0,
"label": "pl"
"warn": 5,
"min": 0,
"crit": 15,
"unit": "%"
}
]
}This would just create a single field. And there is an expected field name where to find the label.
Otherwise you need to fetch the entire document, scan for all of the fields that start with "perfdata." to find the perfdata fields... and would still need to split the key name at the . to get to the label, right?
This field can then the either object or nested:
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just to make things a bit clearer. When using an array using nested here might be better, since it maintains the independence of each object.
For example, when searching all perfdata for an object:
curl -X POST "http://localhost:9200/icinga2/_search" -H 'Content-Type: application/json' -d'
{
"query": {
"bool": {
"must": [
{ "match": { "host": "myhost" }},
{ "match": { "service": "ping6" }}
]
}
},
"fields": [
"check_result.perfdata.*"
],
"_source": false
}
'An object mapping returns this:
{
"_id": "AZoRQYZWWEqPxFmVW73R",
"fields": {
"check_result.perfdata.unit": [ "ms", "%" ],
"check_result.perfdata.label": [ "rta", "pl" ],
"check_result.perfdata.warn": [ 100, 5 ],
"check_result.perfdata.min": [ 0, 0 ],
"check_result.perfdata.value": [ 0, 0 ],
"check_result.perfdata.crit": [ 200, 15 ]
}
}And a nested mapping returns:
{
"_id": "AZoRUkSmucmeEsXVMUwm",
"fields": {
"check_result.perfdata": [
{
"warn": [ 100 ],
"unit": [ "ms" ],
"min": [ 0 ],
"crit": [200 ],
"label": [ "rta" ],
"value": [ 0 ]
},
{
"warn": [ 5 ],
"unit": [ "%" ],
"min": [ 0 ],
"crit": [ 15 ],
"label": ["pl" ],
"value": [ 0 ]
}
]
}There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think I know the puzzle piece you are missing:
We are indexing the documents based on the check, similar to what elasticsearch already does (e.g. beats might store data in metrics-system.diskio-master).
If you read the config file, you might realize that we are doing something similar with metrics-icinga2.<check>-<namespace>. Since the same check should always produce the same output, we won't have an explosion of fields.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also, neither object nor nested work with timeseries datastreams. (If you know any better, feel free to correct me on that.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah ok I see. Thanks for the clarification. So each check would get its own datastream and thus indices. 👍 I hadn't had the time to compile and test the new writer yet, just skimmed over the code.
Are there any things we need to consider when having a datastream per check? There also is a limit on the number of shards per node. https://www.elastic.co/docs/reference/elasticsearch/configuration-reference/miscellaneous-cluster-settings
Might not be an issue, just wanted to mention it. And users can always add more data nodes to scale up.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Finally had some time to test it. Looks good.
curl localhost:9200/_cat/indices
yellow open .ds-metrics-icinga2.swap-default-2025.10.30-000001 CBEhsKj5QweXa12A4PFafQ 1 1 17 0 224.4kb 224.4kb 224.4kb
yellow open .ds-metrics-icinga2.ping4-default-2025.10.30-000001 fe8buP9QROOWv5xt6qxI5A 1 1 16 0 221.3kb 221.3kb 221.3kb
yellow open .ds-metrics-icinga2.ping6-default-2025.10.30-000001 kqhy-33dRl2CtNw2tWe7MQ 1 1 15 0 214.6kb 214.6kb 214.6kb
yellow open .ds-metrics-icinga2.ssh-default-2025.10.30-000001 LKgw--JYSF66eAEZNhZFIQ 1 1 15 0 213.8kb 213.8kb 213.8kb
I still wonder how the amount of incides/shards might affect larger setups.
But the search does what it should do:
curl -X POST "localhost:9200/metrics-icinga2.load-default/_search" -H 'Content-Type: application/json' -d'
{
"query": {
"bool": {
"must": [
{ "match": { "service.name": "3ecab54cb896!load" }}
]
}
},
"fields": [
"perfdata.*"
],
"_source": false
}
It's still a bit tricky to fetch the labels for each metric (load1, load5, etc), since it's in a key, but it's an improvement over the current ElasticsearchWriter. Example result:
{
"_index": ".ds-metrics-icinga2.load-default-2025.10.30-000001",
"_id": "Cj19dEiEhn-MNA9xAAABmjWiFKQ",
"_score": 0.03390155,
"fields": {
"perfdata.load15.min": [
0.0
],
"perfdata.load1.min": [
0.0
],
"perfdata.load5.value": [
2.22
],
"perfdata.load5.min": [
0.0
],
"perfdata.load15.value": [
2.63
],
"perfdata.load1.value": [
2.16
]
}
}
when I don't query for the fields, it's a bit simpler:
...
"perfdata": {
"load1": {
"min": 0,
"value": 2.3
},
"load15": {
"min": 0,
"value": 2.2
},
"load5": {
"min": 0,
"value": 2
}
}
...
Overall very nice solution.
| "mode": "time_series", | ||
| "routing_path": "check.checkable", | ||
| "mapping": { | ||
| "total_fields.limit": 2000 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think we want to do this. The default limit of 1000 is high enough and instead we should avoid writing so many fields.
By that I mean, instead of using the perfdata label in a field name, we should have it in a field "label"
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should not get much over this, but in my tests I found the icingadb check in particular to write a lot of performance data into its datastream metrics-icinga2.icingadb-default. I don't think any other check will go far beyond that.
| } | ||
| } | ||
| }, | ||
| "mappings": { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was wondering if we could use a mapping closer to the current ElasticsearchWriter?
It should contain everything we need for a timeseries. Instead of a new "checkable" fields, we could add "time_series_dimension": true to both the host and service fields, probably also the checkcommand.
For example:
{
"template": {
"mappings": {
"properties": {
"@timestamp": {
"type": "date"
},
"check_command": {
"type": "keyword",
"time_series_dimension": true
},
"check_result": {
"properties": {
"check_source": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword",
"ignore_above": 256
}
}
},
"command": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword",
"ignore_above": 256
}
}
},
"execution_end": {
"type": "date"
},
"execution_start": {
"type": "date"
},
"execution_time": {
"type": "float"
},
"exit_status": {
"type": "long"
},
"latency": {
"type": "float"
},
"output": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword",
"ignore_above": 256
}
}
},
"perfdata": {
"type": "nested",
"properties": {
"crit": {
"type": "long"
},
"label": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword",
"ignore_above": 256
}
}
},
"max": {
"type": "long"
},
"min": {
"type": "long"
},
"unit": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword",
"ignore_above": 256
}
}
},
"value": {
"type": "long"
},
"warn": {
"type": "long"
}
}
},
"schedule_end": {
"type": "date"
},
"schedule_start": {
"type": "date"
},
"state": {
"type": "long"
},
"vars_after": {
"properties": {
"attempt": {
"type": "long"
},
"reachable": {
"type": "boolean"
},
"state": {
"type": "long"
},
"state_type": {
"type": "long"
}
}
},
"vars_before": {
"properties": {
"attempt": {
"type": "long"
},
"reachable": {
"type": "boolean"
},
"state": {
"type": "long"
},
"state_type": {
"type": "long"
}
}
}
}
},
"current_check_attempt": {
"type": "long"
},
"host": {
"type": "keyword",
"time_series_dimension": true
},
"last_hard_state": {
"type": "long"
},
"last_state": {
"type": "long"
},
"max_check_attempts": {
"type": "long"
},
"reachable": {
"type": "boolean"
},
"service": {
"type": "keyword",
"time_series_dimension": true
},
"state": {
"type": "long"
},
"state_type": {
"type": "long"
},
"timestamp": {
"type": "date"
},
"type": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword",
"ignore_above": 256
}
}
}
}
}
}
}There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This could be a _component_template that is used for both Writers
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You are right on the checkable. We can make the host and service name timeseries dimensions. But since we are already writing checks into their own datastreams, I don't think the check_command needs to be a dimension as well.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've also commented a few more nitpicks, but for the moment I want to primarily focus on two areas:
- Finalize a format/index-template that can also be used by the basic ElasticsearchWriter in the future. (Please see @martialblog's detailed comments in addition to mine)
- Minimizing the impact this has on
ProcessCheckResultexecution times by moving most stuff into the work queue.
I'm open to suggestions, especially regarding the data format.
| /* Enable sending the threashold values as additional fields | ||
| * with the service check metrics. If set to true, it will | ||
| * send warn and crit for every performance data item. | ||
| */ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| /* Enable sending the threashold values as additional fields | |
| * with the service check metrics. If set to true, it will | |
| * send warn and crit for every performance data item. | |
| */ | |
| /* Enable sending the threshold values as additional fields | |
| * with the service check metrics. If set to true, it will | |
| * send warn and crit for every performance data item. | |
| */ |
| } catch (const StatusCodeException& es) { | ||
| int status_code = es.GetStatusCode(); | ||
| if (status_code == static_cast<int>(http::status::bad_request)) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I still don't like this.
the underlying type of http::status is unsigned, so you should at least use that in StatusCodeException instead of int. However even better would be to move the Exception class to the cpp file and directly store the enum-type.
| } catch (const StatusCodeException& es) { | |
| int status_code = es.GetStatusCode(); | |
| if (status_code == static_cast<int>(http::status::bad_request)) { | |
| } catch (const StatusCodeException& es) { | |
| if (es.GetStatusCode() == http::status::bad_request) { |
| StatusCodeException(int status_code, String message, String body) | ||
| : std::runtime_error(std::move(message)), m_StatusCode(status_code), m_Body(std::move(body)) {} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the status_code should be part of the message passed to std:.runtime_error, since DiagnosticInformation() will only print that and you're not catching the StatusCodeException directly in every instance, so that information would get lost otherwise.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, I did not know that. I'll make sure to add more information to the error message.
| ${ICINGA2_CONFIGDIR}/features-available | ||
| ) | ||
|
|
||
|
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Superfluous newline
| namespace http = beast::http; | ||
|
|
||
| static void ExceptionHandler(const boost::exception_ptr& exp); | ||
| static Array::Ptr ExtractTemplateTags(const MacroProcessor::ResolverList& resolvers, const Array::Ptr& tags_tmpl, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
tags_tmpl vs. tag_tmpl in the definition.
| return Convert::ToBool(m_CompiledFilter->Evaluate(frame)); | ||
| } | ||
|
|
||
| void ElasticsearchDatastreamWriter::CheckResultHandler(const Checkable::Ptr& checkable, const CheckResult::Ptr& cr) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since this function is executed synchronously during ProcessCheckResult, which is a bottleneck as it is, this should only do as much as it really needs to, then enqueue the rest of the work in the work queue, similarly to ElasticsearchWriter since #10420.
Essentially, just the state-information from the checkable needs to be gathered beforehand, so basically set those fields of ecs_host and ecs_service here, then pass it into the work queue, fill out the rest and build the document there.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, I actually never ran into the issue where ProcessCheckResult was the bottleneck and wanted to optimize for write speed to elasticsearch instead by having the values pre-computed. I'll restructure it as you suggested.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ProcessCheckResult is probably the busiest crossing in the entire program, depending on what features are enabled and how many checks are performed at which intervals. While it may look like it's blocking per-checkable, it's essentially blocking per-endpoint connection. We have been talking about making it process check results in parallel, at least per-checkable, but I don't think anybody is currently working on that.
I don't think write speed to elasticsearch will be impacted by moving these things onto the queue, since you have to go through the queue once anyway (currently just for sending) and at that point it doesn't matter on which side of the of the work queue the data is put together.
| Dictionary::Ptr ecs_host = new Dictionary({ | ||
| {"name", host->GetDisplayName()}, | ||
| {"hostname", host->GetName()}, | ||
| {"soft_state", host->GetState()}, | ||
| {"hard_state", host->GetLastHardState()} | ||
| }); | ||
| if (!host->GetZoneName().IsEmpty()) ecs_host->Set("zone", host->GetZoneName()); | ||
|
|
||
| char _addr[16]; | ||
| if (!host->GetAddress().IsEmpty() && inet_pton(AF_INET, host->GetAddress().CStr(), _addr) == 1) { | ||
| ecs_host->Set("ip", host->GetAddress()); | ||
| } else if (!host->GetAddress6().IsEmpty() && inet_pton(AF_INET6, host->GetAddress6().CStr(), _addr) == 1) { | ||
| ecs_host->Set("ip", host->GetAddress6()); | ||
| } else if (!host->GetAddress().IsEmpty()) { | ||
| ecs_host->Set("fqdn", host->GetAddress()); | ||
| } | ||
|
|
||
| Dictionary::Ptr ecs_service; | ||
| if (service) { | ||
| ecs_service = new Dictionary({ | ||
| {"name", service->GetName()}, | ||
| {"display_name", service->GetDisplayName()}, | ||
| {"soft_state", service->GetState()}, | ||
| {"hard_state", service->GetLastHardState()} | ||
| }); | ||
| if (!service->GetZoneName().IsEmpty()) ecs_service->Set("zone", service->GetZoneName()); | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ecs_service and ecs_host are identical in structure and in case this check result is for a service, the host's state isn't really relevant anyway. On the other hand, we're missing information about reachability and the current/max attempts. This also seems to make it harder on the consumer side to evaluate, given that they would need to check if service exists and then either work with the data from the "host" or the "service" subobject.
Unless there is a good reason for doing it this way, I'd suggest sticking with the way it's done in the old ElasticsearchWriter, have a generic checkable object ecs_checkable with a host field and an optional service field.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I split them like this, because the objects host and service have a special meaning in the elasticsearch common schema.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, then it makes perfect sense filling them out this way. I'm still very new to elasticsearch and will have to look at the ECS. Up to now I was operating under the assumption that the structure of the data sent to elasticsearch was entirely arbitrary.
Could you please add a link to the relevant documentation for these fields in the code where they are filled out? I think it would be helpful as a reminder and for other people reading the code that are as unfamiliar with es as myself. 😄
|
|
||
| String checkable_name = service == nullptr ? host->GetName() : service->GetName(); | ||
| Dictionary::Ptr check_result = new Dictionary({ | ||
| {"checkable", checkable_name}, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This seems to be redundant, given that we already have this information in the host/service/checkable object.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes. @martialblog has mentioned that as well. we can remove that.
9a83f44 to
9b202be
Compare
4f050e7 to
9e06023
Compare
General improvements to code and documentation. Fixing comments by: - Mattia Codato - Johannes Schmidt
9e06023 to
5c270a5
Compare
|
Thanks @martialblog for the insight. I'd encourage you to read the commit messages, as I tried to lay out my reasoning for some decisions there. Also, take a look at the example config file, as I think it clears up some of your concerns. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I took a look at the filter option today and I think that's a very good idea to add something like this to a perfdata writer. We should definitely update the other writers with something similar when we get the chance.
| BOOST_THROW_EXCEPTION(ValidationError(this, { "filter" }, "Filter must be an expression.")); | ||
| } | ||
|
|
||
|
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Superfluous newline
| std::unique_ptr<Expression>(MakeLiteral(filter)), | ||
| std::unique_ptr<Expression>(MakeLiteral("call")) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should be indented.
| } | ||
|
|
||
| void ElasticsearchDatastreamWriter::ValidateFilter(const Lazy<Value> &lvalue, const ValidationUtils &utils) { | ||
| Value filter = lvalue(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No need to take this by value, make it a const reference.
| FunctionCallExpression *fexpr = new FunctionCallExpression(std::move(indexer), std::move(args)); | ||
|
|
||
| m_CompiledFilter = std::move(fexpr); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just assign this directly. Move doesn't do anything here, since this is a raw pointer.
| * You can use any attribute of the host, service, checkable or | ||
| * checkresult (cr) objects in the filter expression. | ||
| */ | ||
| // filter = {{ "host.name == 'myhost' || service.name == 'myservice'" }} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why is this quoted inside the lambda? Furthermore, if you remove the outer quotes, the inner single-quotes are not valid DSL syntax.
| Host::Ptr host; | ||
| Service::Ptr service; | ||
| tie(host, service) = GetHostService(checkable); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You can use structured bindings, like so:
| Host::Ptr host; | |
| Service::Ptr service; | |
| tie(host, service) = GetHostService(checkable); | |
| auto [host, service] = GetHostService(checkable); |
| frameNS->Set("checkable", checkable); | ||
| frameNS->Set("check_result", cr); | ||
| frameNS->Set("host", host); | ||
| frameNS->Set("service", service); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"checkable" and "check_result" are redundant here. The user would need some logic anyway to know if they're dealing with a host or service, and using this would just reverse that logic. Sure, you could do something like match(checkable.name, {{{test*}}}) to get both the host and all its services, but at the cost of introducing a new convention, since we mostly only bind "host", "service" and "obj" in cases like the API-filters.
And the check result can also be retrieved via (host|service).last_check_result.
| Host::Ptr host; | ||
| Service::Ptr service; | ||
| tie(host, service) = GetHostService(checkable); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also use structured bindings here please.
| /* The datastream namespace to use. This can be used to separate different | ||
| * Icinga instances. Or for letting multiple Writers write to different | ||
| * datastreams in the same Elasticsearch cluster. By using the filter option. | ||
| * The Elasticsearch datastream name will be | ||
| * "metrics-icinga2.{check}-{datastream_namespace}". | ||
| */ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The block comments in this file are a bit inconsistent. Can you please ensure they're in this format:
/* Foo
* bar
*/
Sorry for nitpicking again 😄
| > **Note** | ||
| > | ||
| > This feature requires Elasticsearch to support timeseries datastreams and have the ecs component template installed. | ||
| > This feature was tested successfully with Elasticsearch 8.12 and 9.0.8. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we need to explicitly mention that the ElasticsearchDatastreamWriter will not work with OpenSearch? Since the current ElasticsearchWriter works with OpenSearch, users might expect the same here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Did some testing with OpenSearch, luckily there is little that needs to change for this implementation to work with OpenSerch.
- The
index-template.jsonneeds to be changed, the users can do that themselves - OpenSearch responds with
charset=UTF-8notcharset=utf-8
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@w1ll-i-code Can we loosen the charset requirement? Modern JSON should use UTF-8 anyways. https://www.rfc-editor.org/rfc/rfc8259
JSON text exchanged between systems that are not part of a closed ecosystem MUST be encoded using UTF-8
A "tolower()" might also work, but maybe @jschmidt-icinga has some ideas on how the codebase should handle it.
I can take care of the documentation regarding OpenSearch once this PR is merged.
| if (response.result_int() > 299) { | ||
| Log(LogCritical, "ElasticsearchDatastreamWriter") | ||
| << "Unexpected response code " << response.result_int() << " from URL '" << url->Format() << "'. Error: " << response.body(); | ||
| BOOST_THROW_EXCEPTION(StatusCodeException( | ||
| response.result_int(), | ||
| "Unexpected response code from Elasticsearch", | ||
| response.body() | ||
| )); | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You should not unconditionally log this to an error. For example if the template already exists a 400 error is returned without this being an error on our side that the user should be aware of.
So either make this LogDebug here and log errors when appropriate in the catch block outside TrySend (which you already do), or ensure we never expect error status codes unless something actually goes wrong, for example by checking if the template already exists before sending it.
|
@w1ll-i-code Maybe we need to handle fact that CheckCommand object can have names that are not allowed and index names. For example: This will result in:
Not sure yet what a good solution might be. Maybe some form of normalization, maybe just letting the users know they need to be careful with the object names, maybe both. |
PR for #10576. For right now, this will be hidden behind a build-flag, until the feature is sufficiently tested and ready for production use. For all the changes made to the original elasticsearch writer, please consult the individual commits.