-
-
Notifications
You must be signed in to change notification settings - Fork 80
Open
Description
Job 2mt13kxolzln2i6awfxyprnud crashed with this traceback:
Pattern ^https?://www\.pinterest.\com/.*\.js$ is invalid (error: bad escape \c at position 25). Ignored.
ERROR Fatal exception.
Traceback (most recent call last):
File "/home/archivebot/.pyenv/versions/3.6.15/envs/archivebot-20241016/lib/python3.6/site-packages/wpull/application/app.py", line 157, in run
yield from pipeline.process()
File "/home/archivebot/.pyenv/versions/3.6.15/envs/archivebot-20241016/lib/python3.6/site-packages/wpull/pipeline/pipeline.py", line 194, in process
yield from self._process_one_worker()
File "/home/archivebot/.pyenv/versions/3.6.15/envs/archivebot-20241016/lib/python3.6/site-packages/wpull/pipeline/pipeline.py", line 215, in _process_one_worker
task.result()
File "/home/archivebot/.pyenv/versions/3.6.15/envs/archivebot-20241016/lib/python3.6/site-packages/wpull/pipeline/pipeline.py", line 119, in process
item = yield from self.process_one(_worker_id=worker_id)
File "/home/archivebot/.pyenv/versions/3.6.15/envs/archivebot-20241016/lib/python3.6/site-packages/wpull/pipeline/pipeline.py", line 103, in process_one
yield from task.process(item)
File "/home/archivebot/.pyenv/versions/3.6.15/envs/archivebot-20241016/lib/python3.6/site-packages/wpull/application/tasks/download.py", line 492, in process
yield from session.app_session.factory['Processor'].process(session)
File "/home/archivebot/.pyenv/versions/3.6.15/envs/archivebot-20241016/lib/python3.6/site-packages/wpull/processor/delegate.py", line 29, in process
return (yield from processor.process(item_session))
File "/home/archivebot/.pyenv/versions/3.6.15/envs/archivebot-20241016/lib/python3.6/site-packages/wpull/processor/web.py", line 92, in process
return (yield from session.process())
File "/home/archivebot/.pyenv/versions/3.6.15/envs/archivebot-20241016/lib/python3.6/site-packages/wpull/processor/web.py", line 174, in process
ok = yield from self._process_robots()
File "/home/archivebot/.pyenv/versions/3.6.15/envs/archivebot-20241016/lib/python3.6/site-packages/wpull/processor/web.py", line 201, in _process_robots
request))
File "/home/archivebot/.pyenv/versions/3.6.15/envs/archivebot-20241016/lib/python3.6/site-packages/wpull/processor/web.py", line 367, in _should_fetch_reason_with_robots
self._fetch_rule.check_initial_web_request(self._item_session, request)
File "/home/archivebot/.pyenv/versions/3.6.15/envs/archivebot-20241016/lib/python3.6/site-packages/wpull/processor/rule.py", line 179, in check_initial_web_request
item_session, verdict, reason, test_info
File "/home/archivebot/.pyenv/versions/3.6.15/envs/archivebot-20241016/lib/python3.6/site-packages/wpull/processor/rule.py", line 130, in consult_hook
PluginFunctions.accept_url, item_session, verdict, reasons,
File "/home/archivebot/.pyenv/versions/3.6.15/envs/archivebot-20241016/lib/python3.6/site-packages/wpull/application/hook.py", line 81, in call
return self._callbacks[name](*args, **kwargs)
File "/home/archivebot/.pyenv/versions/3.6.15/envs/archivebot-20241016/lib/python3.6/site-packages/wpull/application/plugin.py", line 49, in wrapper
return func(*args, **kwargs)
File "archive_bot_plugin.py", line 227, in accept_url
pattern = self.settings.ignore_url(item_session.url_record)
File "/home/archivebot/ArchiveBot-c/pipeline/archivebot/wpull/settings.py", line 50, in ignore_url
return self.ignoracle.ignores(record_info)
File "/home/archivebot/ArchiveBot-c/pipeline/archivebot/wpull/ignoracle.py", line 110, in ignores
self._compiled.append((pattern, compiledPattern))
UnboundLocalError: local variable 'compiledPattern' referenced before assignmentThis crash will only happen when the invalid ignore pattern appears first in the pattern set iterator. Otherwise, the previous ignore pattern will be duplicated (which causes no harm apart from a very minor performance impact).
The fix is that the exception handler needs to continue.