Skip to content

Lock wait timeout exceeded for state_history when using retention #893

@oxzi

Description

@oxzi

While working on another issue with a larger test instance, I ran into a database locking issue after enabling the Icinga DB retention.

After adding the following retention block to my config, the problems started.

retention:
  history-days: 1
  sla-days: 1

Shortly after restarting Icinga DB, the following error appeared, always for the state_history table.

2025-02-25T10:58:39.859Z        FATAL   icingadb        Error 1205 (HY000): Lock wait timeout exceeded; try restarting transaction
can't perform "INSERT INTO \"state_history\" (\"previous_soft_state\", \"check_attempt\", \"check_source\", \"scheduling_source\", \"environment_id\", \"state_type\", \"id\", \"previous_hard_state\", \"output\", \"max_check_attempts\", \"object_type\", \"event_time\", \"endpoint_id\", \"soft_state\", \"long_output\", \"host_id\", \"service_id\", \"hard_state\") VALUES (:previous_soft_state,:check_attempt,:check_source,:scheduling_source,:environment_id,:state_type,:id,:previous_hard_state,:output,:max_check_attempts,:object_type,:event_time,:endpoint_id,:soft_state,:long_output,:host_id,:service_id,:hard_state) ON DUPLICATE KEY UPDATE \"id\" = VALUES(\"id\")"
github.com/icinga/icinga-go-library/database.CantPerformQuery
        github.com/icinga/icinga-go-library@v0.4.0/database/utils.go:16
github.com/icinga/icinga-go-library/database.(*DB).NamedBulkExec.func1.(*DB).NamedBulkExec.func1.1.2.1
        github.com/icinga/icinga-go-library@v0.4.0/database/db.go:535
github.com/icinga/icinga-go-library/retry.WithBackoff
        github.com/icinga/icinga-go-library@v0.4.0/retry/retry.go:65
github.com/icinga/icinga-go-library/database.(*DB).NamedBulkExec.func1.(*DB).NamedBulkExec.func1.1.2
        github.com/icinga/icinga-go-library@v0.4.0/database/db.go:530
golang.org/x/sync/errgroup.(*Group).Go.func1
        golang.org/x/sync@v0.10.0/errgroup/errgroup.go:78
runtime.goexit
        runtime/asm_amd64.s:1700
retry deadline exceeded
github.com/icinga/icinga-go-library/retry.WithBackoff
        github.com/icinga/icinga-go-library@v0.4.0/retry/retry.go:100
github.com/icinga/icinga-go-library/database.(*DB).NamedBulkExec.func1.(*DB).NamedBulkExec.func1.1.2
        github.com/icinga/icinga-go-library@v0.4.0/database/db.go:530
golang.org/x/sync/errgroup.(*Group).Go.func1
        golang.org/x/sync@v0.10.0/errgroup/errgroup.go:78
runtime.goexit
        runtime/asm_amd64.s:1700

Looking at the database process list, a parallel DELETE statement is executed for this table, looking like one generated by Icinga DB's retention part.

MariaDB [icingadb]> show processlist;
+-----+----------+-----------------+----------+---------+------+----------+---------------------------------------------------------------------------------------
---------------+----------+
| Id  | User     | Host            | db       | Command | Time | State    | Info
               | Progress |
+-----+----------+-----------------+----------+---------+------+----------+---------------------------------------------------------------------------------------
---------------+----------+
|  55 | icingadb | localhost:54518 | icingadb | Execute | 3341 | Updating | DELETE FROM state_history WHERE environment_id = ? AND event_time < ?
ORDER BY event_time LIMIT 5000 |    0.000 |
| 144 | icingadb | localhost       | icingadb | Query   |    0 | starting | show processlist
               |    0.000 |
+-----+----------+-----------------+----------+---------+------+----------+---------------------------------------------------------------------------------------
---------------+----------+
2 rows in set (0.000 sec)

Such a crash happened reproducibly; sometimes after five minutes, sometimes after an hour, but eventually it crashed. After disabling the retention part, it worked again.

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions