A simple Flask application to extract information from invoice PDFs. It can currently identify the total amount and the description of the items.
- PDF Parsing: Reads and extracts text from PDF invoice files.
- Total Amount Extraction: Identifies and extracts the total amount from the invoice.
- Description Extraction: Captures the description of the items listed in the invoice.
- Simple Flask API: Provides an easy-to-use API endpoint to upload and process invoices.
- Python 3.x
- pip (Python package installer)
-
Clone the repository:
git clone https://github.com/vectorc0de/nlp_api cd nlp_api
-
Install the required dependencies:
pip install -r requirements.txt
-
Set the Flask environment (if needed):
export FLASK_APP=your_app_file.py # Replace your_app_file.py # export FLASK_ENV=development # Optional: for development mode
-
Run the Flask development server:
flask run
Or, if you have
app.py
as your main file:python app.py
-
The application will be accessible at
http://127.0.0.1:5000/
(or the port specified by Flask).
You can interact with the application by sending a POST request to the /extract_info
endpoint with a PDF file.
Example using curl
:
curl -X POST -F "pdf_file=@/path/to/your/invoice.pdf" [http://127.0.0.1:5000/extract_info](http://127.0.0.1:5000/extract_info)