-
Notifications
You must be signed in to change notification settings - Fork 553
Open
Labels
bugSomething isn't workingSomething isn't working
Description
Describe the bug
uploading and indexing a big pdf containing scans , tesseract is used but is too slow and get a timeout:
tesseract is still running when extractor do a tiemout
unstract-backend | 172.28.0.1 - - [17/Mar/2025:09:57:30 +0000] "GET /api/v1/socket/?EIO=4&transport=websocket HTTP/1.1" 400 25 "-" "Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/133.0.0.0 Safari/537.36"
unstract-x2text-service | [2025-03-17 09:57:30 +0000] [1] [CRITICAL] WORKER TIMEOUT (pid:7)
unstract-x2text-service | [2025-03-17 09:57:30 +0000] [7] [ERROR] Error handling request /api/v1/x2text/process
unstract-x2text-service | Traceback (most recent call last):
unstract-x2text-service | File "/app/.venv/lib/python3.9/site-packages/gunicorn/workers/sync.py", line 134, in handle
unstract-x2text-service | self.handle_request(listener, req, client, addr)
unstract-x2text-service | File "/app/.venv/lib/python3.9/site-packages/gunicorn/workers/sync.py", line 177, in handle_request
unstract-x2text-service | respiter = self.wsgi(environ, resp.start_response)
unstract-x2text-service | File "/app/.venv/lib/python3.9/site-packages/flask/app.py", line 1498, in __call__
unstract-x2text-service | return self.wsgi_app(environ, start_response)
unstract-x2text-service | File "/app/.venv/lib/python3.9/site-packages/flask/app.py", line 1473, in wsgi_app
unstract-x2text-service | response = self.full_dispatch_request()
unstract-x2text-service | File "/app/.venv/lib/python3.9/site-packages/flask/app.py", line 880, in full_dispatch_request
unstract-x2text-service | rv = self.dispatch_request()
unstract-x2text-service | File "/app/.venv/lib/python3.9/site-packages/flask/app.py", line 865, in dispatch_request
unstract-x2text-service | return self.ensure_sync(self.view_functions[rule.endpoint])(**view_args) # type: ignore[no-any-return]
unstract-x2text-service | File "/app/app/authentication_middleware.py", line 16, in wrapper
unstract-x2text-service | return func(*args, **kwargs)
unstract-x2text-service | File "/app/app/controllers/controller.py", line 120, in process
unstract-x2text-service | response = requests.request(
unstract-x2text-service | File "/app/.venv/lib/python3.9/site-packages/requests/api.py", line 59, in request
unstract-x2text-service | return session.request(method=method, url=url, **kwargs)
unstract-x2text-service | File "/app/.venv/lib/python3.9/site-packages/requests/sessions.py", line 589, in request
unstract-x2text-service | resp = self.send(prep, **send_kwargs)
unstract-x2text-service | File "/app/.venv/lib/python3.9/site-packages/requests/sessions.py", line 703, in send
unstract-x2text-service | r = adapter.send(request, **kwargs)
unstract-x2text-service | File "/app/.venv/lib/python3.9/site-packages/requests/adapters.py", line 667, in send
unstract-x2text-service | resp = conn.urlopen(
unstract-x2text-service | File "/app/.venv/lib/python3.9/site-packages/urllib3/connectionpool.py", line 789, in urlopen
unstract-x2text-service | response = self._make_request(
unstract-x2text-service | File "/app/.venv/lib/python3.9/site-packages/urllib3/connectionpool.py", line 536, in _make_request
unstract-x2text-service | response = conn.getresponse()
unstract-x2text-service | File "/app/.venv/lib/python3.9/site-packages/urllib3/connection.py", line 464, in getresponse
unstract-x2text-service | httplib_response = super().getresponse()
unstract-x2text-service | File "/usr/local/lib/python3.9/http/client.py", line 1377, in getresponse
unstract-x2text-service | response.begin()
unstract-x2text-service | File "/usr/local/lib/python3.9/http/client.py", line 320, in begin
unstract-x2text-service | version, status, reason = self._read_status()
unstract-x2text-service | File "/usr/local/lib/python3.9/http/client.py", line 281, in _read_status
unstract-x2text-service | line = str(self.fp.readline(_MAXLINE + 1), "iso-8859-1")
unstract-x2text-service | File "/usr/local/lib/python3.9/socket.py", line 716, in readinto
unstract-x2text-service | return self._sock.recv_into(b)
unstract-x2text-service | File "/app/.venv/lib/python3.9/site-packages/gunicorn/workers/base.py", line 204, in handle_abort
unstract-x2text-service | sys.exit(1)
unstract-x2text-service | SystemExit: 1
To reproduce
llm profile:
Name LLM Embedding Model Vector Database Text Extractor
ollama-deepseek-r1 ollama-deepseek-r1 ollama-emb-deepseek-r1 pg-vdb-1 unstructured-io-1
Expected behavior
indexation ok
Environment details
- Version: latest with optional profil
Additional context
Question
is there a way to replace old tesseract , not accelerated by gpu, with model llama 3.2 vision?
Metadata
Metadata
Assignees
Labels
bugSomething isn't workingSomething isn't working