Skip to content

Conversation

sebastian-nagel
Copy link
Contributor

The plot show "new items" is difficult to read:

It contains the number of "new page captures" (it's the crawl size) and new (never seen before) URLs and content digests. The latter are almost equal to the number of pages, hence bears little information.

Most informative is the number of "new URLs". This PR drops the other two:

Show only URLs in the plot of "new items" to make it more readable.
Numbers of new "pages" (aka. "captures") and "digests" are not
really informative: it's just the size of the crawls resp. a number
close to the size as there are very few exact duplicates with
identical content digests. Drop the two item types and keep only
the URLs.
@thunderpoot
Copy link
Member

That does look much cleaner

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants