Option to ignore further seeds from a host that has connection failures

This is probably a bit of an edge case, but I have a crawl that has seeds across a lot of hostnames, with many seeds at each of those hostnames. Sometimes a server goes offline, leading to *really* slow crawls (each of the seeds for that hostname potentially takes a long time for Browsertrix to give up connecting to a server, and then it retries a few times as well).

That’s all totally reasonable and expected, but it would be really nice if I could configure Browsertrix to delay crawling other seeds at a given hostname if the first seed at that hostname fails with a connection error (DNS resolution or connection timeout, I think), and to discard those seeds entirely (so it just never bothers trying to load them) if the first seed completely fails after retries are exhausted. In the vast majority of cases, that seems like reasonable and generally correct behavior, and it would really help not waste a lot of time and resources on pointless requests.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Option to ignore further seeds from a host that has connection failures #879

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Uh oh!

Option to ignore further seeds from a host that has connection failures #879

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions