Skip to content

Correctly handle colocated table repair #4565

@timtimb0t

Description

@timtimb0t

Issue occurred on my own staging during LWT with enabled tablets testing. Error itself:

< t:2025-08-22 11:46:39,578 f:db_log_reader.py l:123  c:sdcm.db_log_reader   p:DEBUG > 2025-08-22T11:46:15.339+00:00 longevity-lwt-24h-lwt-test-db-node-353e20c2-0-4     !INFO | scylla[4700]:  [shard 0:strm] repair - repair[e0f68f3e-a74a-4d3e-9920-819b62a70b67]: Finished to process repair_flush_hints_batchlog_request from node=10.142.0.34 updated=false flush_hints_batchlog_time=  1755863136 flush_cache_time=60000ms flush_duration=0s
< t:2025-08-22 11:46:39,579 f:base.py         l:229  c:RemoteLibSSH2CmdRunner p:DEBUG > <10.142.0.50>: Run:		35448721-7f4d-11f0-805e-42010a8e0032
< t:2025-08-22 11:46:39,579 f:base.py         l:229  c:RemoteLibSSH2CmdRunner p:DEBUG > <10.142.0.50>: Status:		ERROR
< t:2025-08-22 11:46:39,579 f:base.py         l:229  c:RemoteLibSSH2CmdRunner p:DEBUG > <10.142.0.50>: Cause:		see more errors in logs: master 10.142.0.35 keyspace lwt_keyspace table lwt_io$paxos command 0: schedule tablet repair task: giving up after 10 attempts: agent [HTTP 500] std::runtime_error (Can't set repair request on a co-located table)
< t:2025-08-22 11:46:39,579 f:base.py         l:229  c:RemoteLibSSH2CmdRunner p:DEBUG > <10.142.0.50>: Start time:	22 Aug 25 11:43:23 UTC
< t:2025-08-22 11:46:39,579 f:base.py         l:229  c:RemoteLibSSH2CmdRunner p:DEBUG > <10.142.0.50>: End time:	22 Aug 25 11:46:22 UTC
< t:2025-08-22 11:46:39,579 f:base.py         l:229  c:RemoteLibSSH2CmdRunner p:DEBUG > <10.142.0.50>: Duration:	2m59s
< t:2025-08-22 11:46:39,579 f:base.py         l:229  c:RemoteLibSSH2CmdRunner p:DEBUG > <10.142.0.50>: Progress:	99%
< t:2025-08-22 11:46:39,579 f:base.py         l:229  c:RemoteLibSSH2CmdRunner p:DEBUG > <10.142.0.50>: Intensity:	1
< t:2025-08-22 11:46:39,579 f:base.py         l:229  c:RemoteLibSSH2CmdRunner p:DEBUG > <10.142.0.50>: Parallel:	0
< t:2025-08-22 11:46:39,579 f:base.py         l:229  c:RemoteLibSSH2CmdRunner p:DEBUG > <10.142.0.50>: Host:	invalid IP
< t:2025-08-22 11:46:39,579 f:base.py         l:229  c:RemoteLibSSH2CmdRunner p:DEBUG > <10.142.0.50>: Datacenters:	
< t:2025-08-22 11:46:39,580 f:base.py         l:229  c:RemoteLibSSH2CmdRunner p:DEBUG > <10.142.0.50>:   - us-east1
< t:2025-08-22 11:46:39,580 f:base.py         l:229  c:RemoteLibSSH2CmdRunner p:DEBUG > <10.142.0.50>: 
< t:2025-08-22 11:46:39,580 f:base.py         l:229  c:RemoteLibSSH2CmdRunner p:DEBUG > <10.142.0.50>: ╭───────────────────────────────┬────────────────────────────────┬──────────┬──────────╮
< t:2025-08-22 11:46:39,580 f:base.py         l:229  c:RemoteLibSSH2CmdRunner p:DEBUG > <10.142.0.50>: │ Keyspace                      │                          Table │ Progress │ Duration │
< t:2025-08-22 11:46:39,580 f:db_log_reader.py l:123  c:sdcm.db_log_reader   p:DEBUG > 2025-08-22T11:46:14.999+00:00 longevity-lwt-24h-lwt-test-db-node-353e20c2-0-3     !INFO | scylla[6438]:  [shard 5: gms] hints_manager - Draining starts for 5d5f5d46-e93b-44f1-b53c-733eab1e35e6
< t:2025-08-22 11:46:39,580 f:base.py         l:229  c:RemoteLibSSH2CmdRunner p:DEBUG > <10.142.0.50>: ├───────────────────────────────┼────────────────────────────────┼──────────┼──────────┤
< t:2025-08-22 11:46:39,580 f:db_log_reader.py l:123  c:sdcm.db_log_reader   p:DEBUG > 2025-08-22T11:46:15.753+00:00 longevity-lwt-24h-lwt-test-db-node-353e20c2-0-5     !INFO | scylla[5888]:  [shard 0: gms] tablets - Set sstables_repaired_at=0 table=52fc3eb0-7f41-11f0-80b5-8f6de9b0b120 tablet=69
< t:2025-08-22 11:46:39,580 f:base.py         l:229  c:RemoteLibSSH2CmdRunner p:DEBUG > <10.142.0.50>: │ banned_keyspace               │                         table1 │ 100%     │ 4s       │
< t:2025-08-22 11:46:39,580 f:base.py         l:229  c:RemoteLibSSH2CmdRunner p:DEBUG > <10.142.0.50>: ├───────────────────────────────┼────────────────────────────────┼──────────┼──────────┤
< t:2025-08-22 11:46:39,580 f:base.py         l:229  c:RemoteLibSSH2CmdRunner p:DEBUG > <10.142.0.50>: │ lwt_keyspace                  │                         lwt_io │ 100%     │ 13s      │
< t:2025-08-22 11:46:39,580 f:base.py         l:229  c:RemoteLibSSH2CmdRunner p:DEBUG > <10.142.0.50>: │ lwt_keyspace                  │                   lwt_io$paxos │ 73%/27%  │ 2m42s    │
< t:2025-08-22 11:46:39,580 f:base.py         l:229  c:RemoteLibSSH2CmdRunner p:DEBUG > <10.142.0.50>: ├───────────────────────────────┼────────────────────────────────┼──────────┼──────────┤
< t:2025-08-22 11:46:39,580 f:base.py         l:229  c:RemoteLibSSH2CmdRunner p:DEBUG > <10.142.0.50>: │ system_distributed_everywhere │ cdc_generation_descriptions_v2 │ 100%     │ 1s       │
< t:2025-08-22 11:46:39,580 f:base.py         l:229  c:RemoteLibSSH2CmdRunner p:DEBUG > <10.142.0.50>: ├───────────────────────────────┼────────────────────────────────┼──────────┼──────────┤
< t:2025-08-22 11:46:39,580 f:base.py         l:229  c:RemoteLibSSH2CmdRunner p:DEBUG > <10.142.0.50>: │ system_distributed            │      cdc_generation_timestamps │ 100%     │ 1s       │
< t:2025-08-22 11:46:39,580 f:base.py         l:229  c:RemoteLibSSH2CmdRunner p:DEBUG > <10.142.0.50>: │ system_distributed            │    cdc_streams_descriptions_v2 │ 100%     │ 1s       │
< t:2025-08-22 11:46:39,580 f:base.py         l:229  c:RemoteLibSSH2CmdRunner p:DEBUG > <10.142.0.50>: │ system_distributed            │                 service_levels │ 100%     │ 1s       │
< t:2025-08-22 11:46:39,580 f:base.py         l:229  c:RemoteLibSSH2CmdRunner p:DEBUG > <10.142.0.50>: │ system_distributed            │              view_build_status │ 100%     │ 1s       │
< t:2025-08-22 11:46:39,580 f:base.py         l:229  c:RemoteLibSSH2CmdRunner p:DEBUG > <10.142.0.50>: ├───────────────────────────────┼────────────────────────────────┼──────────┼──────────┤
< t:2025-08-22 11:46:39,580 f:base.py         l:229  c:RemoteLibSSH2CmdRunner p:DEBUG > <10.142.0.50>: │ system_replicated_keys        │                 encrypted_keys │ 100%     │ 1s       │
< t:2025-08-22 11:46:39,580 f:base.py         l:229  c:RemoteLibSSH2CmdRunner p:DEBUG > <10.142.0.50>: ╰───────────────────────────────┴────────────────────────────────┴──────────┴──────────╯
< t:2025-08-22 11:46:39,581 f:base.py         l:141  c:RemoteLibSSH2CmdRunner p:DEBUG > <10.142.0.50>: Command "sudo sctool  -c 57d3fa0a-508e-490b-a4aa-bf37417d9b83 progress repair/3617483c-6936-4da5-b283-7d0e45bb7a7e" finished with status 0

No keyspaces were passed as params while invoking SM repair. Not sure if its a bug but it seems that SM shouldn't report such error or should not even try to repair certain table or colocated tables in general.

Argus run

Metadata

Metadata

Labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions