Skip to content

Continuous replication jobs missed from _active_tasks after changes_reader_died,{timeout,ibrowse_stream_cleanup} message appears in the log #5609

@Sdas0000

Description

@Sdas0000

Some of the replication jobs did not start after changes_reader_died,{timeout,ibrowse_stream_cleanup} message

Description

Jul 24 14:36:52 xxxx-xxxx-ro-us-west1-x couchdb[826059]:

ChangesReader process died with reason: {changes_reader_died,{timeout,ibrowse_stream_cleanup}}
Replication 0c59025774c3cc12f61c49f2e2c02c5d+continuous (https://xxxx-xxxx-master-x.pr-xxxx-xxxx.str.xxxxxxx.com/xxxx_xxxx-300/ -> https://xxxx-xxxxx-ro-us-west1-x.pr-xxxxx-xxxxxx.str.xxxxx.com/xxxx_xxxx-300/) failed: {changes_reader_died,{{timeout,ibrowse_stream_cleanup}}

When ChangesReader process died on xxxx_xxxx-300 , the _scheduler/jobs didn't crashed and restarted (Last crashed was on 2025-07-23T15:22:33)

{
"database": "_replicator",
"id": "0c59025774c3cc12f61c49f2e2c02c5d+continuous",
"pid": "<0.1761.7093>",
"source": "https://xxxx-xxxx-master-x.pr-xxxxx-xxxxx.str.xxxx.com/xxxx_xxxx-300/",
"target": "https://xxxx-xxxxx-ro-us-west1-x.pr-xxxx-xxxxxx.str.xxxxx.com/xxxx_xxxx-300/",
"user": null,
"doc_id": "xxxxx_xxxx_replication_300",
"info": {
"revisions_checked": 3888719,
"missing_revisions_found": 421066,
"docs_read": 421064,
"docs_written": 421064,
"changes_pending": 0,
"doc_write_failures": 0,
"bulk_get_docs": 421064,
"bulk_get_attempts": 421064,
"checkpointed_source_seq": "22113348-g1AAAAJveJyl0EEOgjAQBdBGSFx4Fgi1ILKSQ3CBdqgWUoopuNYzuPI2eiVPgBMwbpvUzUwyk_x5GU0ICVVQkxz6C6halM0ouwg4KBl1fBiljbIY4rONhrG3OOsNt6DixuDKcK0xYMWJOEzT1KpAkGpz6nC25ls41jvmn-xQMadKlFjF9QfT7xkmISlkvvMPd8CoE2ZCrOSGDW3PBccfM46mlGc08T_w99cW3GvBfT93r2Yc2xcspbn_gfYDW0XUDA",
"source_seq": "22113348-g1AAAAJveJyl0EEOgjAQBdBGSFx4Fgi1ILKSQ3CBdqgWUoopuNYzuPI2eiVPgBMwbpvUzUwyk_x5GU0ICVVQkxz6C6halM0ouwg4KBl1fBiljbIY4rONhrG3OOsNt6DixuDKcK0xYMWJOEzT1KpAkGpz6nC25ls41jvmn-xQMadKlFjF9QfT7xkmISlkvvMPd8CoE2ZCrOSGDW3PBccfM46mlGc08T_w99cW3GvBfT93r2Yc2xcspbn_gfYDW0XUDA",
"through_seq": "22113348-g1AAAAJveJyl0EEOgjAQBdBGSFx4Fgi1ILKSQ3CBdqgWUoopuNYzuPI2eiVPgBMwbpvUzUwyk_x5GU0ICVVQkxz6C6halM0ouwg4KBl1fBiljbIY4rONhrG3OOsNt6DixuDKcK0xYMWJOEzT1KpAkGpz6nC25ls41jvmn-xQMadKlFjF9QfT7xkmISlkvvMPd8CoE2ZCrOSGDW3PBccfM46mlGc08T_w99cW3GvBfT93r2Yc2xcspbn_gfYDW0XUDA"
},
"history": [
{
"timestamp": "2025-07-23T15:22:33Z",
"type": "started"
},
{
"timestamp": "2025-07-23T15:22:33Z",
"type": "crashed",
"reason": "{changes_reader_died,{timeout,ibrowse_stream_cleanup}}"
},

Steps to Reproduce

Expected Behaviour

The replication job should be in _active_tasks , may be in 'running' or 'pending' state. But the current issue is it never restarts if it is stopped. Looks like couchdb checks if _scheduler/docs(jobs) exists it assumes that _active tasks will be there.

we verified there are new documents added in source which never appears in target database even waiting for few days.

As per couchdb document "_Changed in version 2.1.0: Because of how the scheduling replicator works, continuous replication jobs could be periodically stopped and then started later. When they are not running they will not appear in the _active_tasks endpoint"

Note: But some time even though when changes_reader_died,{timeout,ibrowse_stream_cleanup} happens for some database , the _scheduler/job crashes and restarts and everything becomes normal.

[NOTE]:

To restart the replication for the missing databases, we have to bounce couchdb on that node. But same issue happens on some other databases on some other nodes after few weeks.

Is there any other way can we restart replication without bouncing the node?

Note : we tried to Update the failed replication user ID and password with invalid entries , which we thought it will crash the _scheduler/job for that replication and will restart after adding correct user ID and password. But It didn't crashed the replication ( The replications for which _active_tasks was missed ). This forced us to bounce couchdb on that node to restart replication.

Your Environment

Couchdb version :

"couchdb":"Welcome","version":"3.4.3","git_sha":"f1a47e66","uuid":"67fc0abd32xxx0c38f75cc627b77411d9f","features":["access-ready","partitioned","pluggable-storage-engines","reshard","scheduler"],"vendor":{"name":"The Apache Software Foundation"}}

  • Operating system and version:

NAME="Ubuntu"
VERSION_ID="22.04"
VERSION="22.04.5 LTS (Jammy Jellyfish)"
VERSION_CODENAME=jammy
ID=ubuntu

Additional Context

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions