Feature: extract only files matching a regexp

In dealing with a megawarc, any reasonably broad set of results will have many hits, possibly too many to hand-write dd calls to extract efficiently (see https://github.com/chfoo/warcat/issues/7 ).

It would be useful if you could pass warcat a regexp like `.*foo\.wordpress\.com.*` to extract all files in a megawarc dealing with a particular website. This can be approximated by telling warcat to extract all files and then deleting non-matches with `find` or other shell script approaches, but at the cost of far more disk IO, temporary storage, and having to work with `find`. (It might also be faster, aside from the disk IO reduction, depending on whether the format stores filenames and warcat can skip over all non-matching warcs. I don't know the details there.)


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Feature: extract only files matching a regexp #8

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Uh oh!

Feature: extract only files matching a regexp #8

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions