A flexible Python web scraper that lets you:
- Input any URL at runtime
- Preview available HTML tags and select which elements to scrape
- Export scraped data to CSV or Excel
- Choose tags dynamically (no hardcoded tag list)
- Confirm before final scraping
- Handles invalid input gracefully
- Displays saved file location at the end
This scraper does not currently support JavaScript-rendered pages.
Support for JS-rendered pages (via Selenium or Playwright) is planned for a future release.
โ Dynamic Tag Detection โ Pre-scrapes the page and lists all available HTML tags
โ User-Controlled Scraping โ Select which tags you want to scrape
โ Multiple Export Options โ Save as CSV or Excel
โ Error Handling โ Handles invalid choices without crashing
โ Clear Exit Options โ Press q anytime to quit
โ File Path Confirmation โ Confirms where your files were saved
- Python 3.8+
- The following Python libraries (see
requirements.txt):requestsbeautifulsoup4pandas
Clone the repository:
git clone https://github.com/YOUR_USERNAME/python-web-scraper.git
cd python-web-scraperInstall dependencies:
pip install -r requirements.txtRun the scraper:
python scraper.py- Enter the URL you want to scrape
- The script previews all HTML tags found
- Select tags to scrape (e.g.,
p, h1, h2) - Choose CSV or Excel output
- Confirm and scrape
- Files are saved in the current folder, and the path is displayed at the end
- โ Add support for JavaScript-rendered pages
- โ Add a Streamlit Web Interface for easy use
- โ Deploy on Streamlit Cloud so anyone can try it online
- โ Support search by CSS selectors or attributes
Pull requests are welcome! For major changes, please open an issue first to discuss what youโd like to change.