Skip to content

trafilatura/2.0.0 (+https://github.com/adbar/trafilatura) Scraper library #619

@robwent

Description

@robwent

Paste the full User-Agent String here


"trafilatura/2.0.0 (+https://github.com/adbar/trafilatura)"

Is this for Addition / Removal?

  • Addition
  • Removal
  • Keep a watch on this one

Did the User-Agent request robots.txt first?

  • Yes
  • No - Directly to sitemap

Post Log Excerpt to show User-Agent behavior (10-20 lines is enough)


80.193.158.65 - - [23/Feb/2025:11:18:36 +0000] "GET /yatco_tax_agreement-sitemap.xml HTTP/2.0" 200 567 "-" "trafilatura/2.0.0 (+https://github.com/adbar/trafilatura)"
80.193.158.65 - - [23/Feb/2025:11:18:39 +0000] "GET /post_tag-sitemap5.xml HTTP/2.0" 200 710 "-" "trafilatura/2.0.0 (+https://github.com/adbar/trafilatura)"
80.193.158.65 - - [23/Feb/2025:11:18:43 +0000] "GET /post_tag-sitemap4.xml HTTP/2.0" 200 1442 "-" "trafilatura/2.0.0 (+https://github.com/adbar/trafilatura)"
80.193.158.65 - - [23/Feb/2025:11:18:46 +0000] "GET /post_tag-sitemap3.xml HTTP/2.0" 200 1772 "-" "trafilatura/2.0.0 (+https://github.com/adbar/trafilatura)"
80.193.158.65 - - [23/Feb/2025:11:18:49 +0000] "GET /post_tag-sitemap2.xml HTTP/2.0" 200 1373 "-" "trafilatura/2.0.0 (+https://github.com/adbar/trafilatura)"
80.193.158.65 - - [23/Feb/2025:11:18:53 +0000] "GET /post_tag-sitemap.xml HTTP/2.0" 200 1346 "-" "trafilatura/2.0.0 (+https://github.com/adbar/trafilatura)"

Additional information

Add any other context about the problem here.

Metadata

Metadata

Labels

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions