🔥DeadURL

First project i made for fun lol :D

🔥DeadURL

A super-efficient, async URL scanner that checks thousands of URLs from a file (.csv/.txt) for dead links (404s) This program can be used to clean a large list of URLs from broken ones.

⚡ Features

Async scanning with concurrency limit (default 15) for fast performance
Domain filtering — scan all URLs or can target specific domains only from the file
File cleaning — automatically remove dead URLs from your file (Optional) [BETA]
Backup system — backs up your original CSV before cleaning
Scan-Report Produced — shows all dead Urls found
Supports most Url's format — http://, https://, www.
Stealth headers — mimics real browser requests for better detection bypassing most Web-security-bots
Progress bar — live scan progress displayed with tqdm
Platform — works on Windows, Linux (Not tested on MacOS)
Optimized for low hardware — Scans 1000 links in 5 mins (Tested on linux | i5-3rd gen | 4GB ram)

⚠️ Important Windows Users Notice

SKIP THIS STEP IF : you do not want to clean your file from dead urls and only want a scan and a Scan report generated

If you are running this on Windows, disable the "Controlled Folder Access" feature in Windows Defender before running the cleaning step.

Steps:

Open Windows Security
Go to virus & threat protection
Click Manage ransomware protection
Turn off Controlled Folder Access You may re-enable this after the scan is completed

🚀 How to Use

Clone or download this repo
Run pip install -r requirements.txt OR py -m pip install -r requirements.txt OR python -m pip install -r requirements.txt to install dependencies 3.⚠️run playwright install or py -m playwright install⚠️
Start the scanner :
```
py DeadURL.py
```
or
```
python DeadURL.py
```
When prompted:
- Drop or enter the full path to your CSV file
- Optionally scan a specified domain (or leave blank to scan all)
- Choose whether to remove dead URLs from the CSV after scan (y/n)
Wait for the scan to complete
Check the generated scan_results_YYYY-MM-DD_HH-MM-SS.txt file for 404 errors
If cleaning was enabled, your original CSV will be backed up and cleaned of dead URLs

💻 Code Overview & Customization

Concurrency limit can be adjusted by changing CONCURRENT_LIMIT at the top of the script for faster/slower scanning depending on your hardware/network
The User-Agent and headers are set in STEALTH_HEADERS — you can update to mimic different browsers or add custom headers if needed
The script uses Playwright to simulate browser requests for better accuracy over simple HTTP requests
URLs are sanitized to only accept those starting with http, https, or www. — can be tweaked inside sanitize_url()
File cleaning will remove only the dead URLs and save a backup automatically

📁 File Format

The file should contain URLs, ideally with a column named API_Name (optional)
URLs can start with http://, https://, or www.
URLs can be separated by commas, spaces, or new lines inside cells

🛠️ Requirements

Python 3.7+
Playwright
pandas
tqdm
requests

📜 License

MIT-License

🤝 Contributions

Pull requests and issues are welcome!

This project is still under development so issues may arise :(

Name		Name	Last commit message	Last commit date
Latest commit History 11 Commits
DeadURL.py		DeadURL.py
LICENSE		LICENSE
README.md		README.md
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

🔥DeadURL

⚡ Features

⚠️ Important Windows Users Notice

🚀 How to Use

💻 Code Overview & Customization

📁 File Format

🛠️ Requirements

📜 License

🤝 Contributions

About

Uh oh!

Releases

Packages

Languages

License

l-RAIN-404-l/DeadURL

Folders and files

Latest commit

History

Repository files navigation

🔥DeadURL

⚡ Features

⚠️ Important Windows Users Notice

🚀 How to Use

💻 Code Overview & Customization

📁 File Format

🛠️ Requirements

📜 License

🤝 Contributions

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages