-
-
Notifications
You must be signed in to change notification settings - Fork 118
Description
This is probably a bit of an edge case, but I have a crawl that has seeds across a lot of hostnames, with many seeds at each of those hostnames. Sometimes a server goes offline, leading to really slow crawls (each of the seeds for that hostname potentially takes a long time for Browsertrix to give up connecting to a server, and then it retries a few times as well).
That’s all totally reasonable and expected, but it would be really nice if I could configure Browsertrix to delay crawling other seeds at a given hostname if the first seed at that hostname fails with a connection error (DNS resolution or connection timeout, I think), and to discard those seeds entirely (so it just never bothers trying to load them) if the first seed completely fails after retries are exhausted. In the vast majority of cases, that seems like reasonable and generally correct behavior, and it would really help not waste a lot of time and resources on pointless requests.
Metadata
Metadata
Assignees
Labels
Type
Projects
Status