This project is a Python-based web scraper designed to extract movie-related information from The Movie Database (TMDB). Using libraries like requests and BeautifulSoup, it collects data such as movie titles, ratings, genres, and cast details. The extracted data is organized into structured formats using Pandas and exported to a CSV file for further analysis.
- Web Scraping: Extracts movie details from multiple pages of the TMDB website.
- Data Storage: Combines data into Pandas DataFrames and exports as CSV.
- Error Handling: Implements robust mechanisms for handling request failures.
- Reusable Functions: Includes modular user-defined functions for easy extensibility.
Ensure you have the following installed:
- Python 3.7+
- Pip (Python package manager)
-
Clone this repository:
git clone <repository_url> cd tmdb-movie-data-scraper
-
Install the required Python libraries:
pip install -r requirements.txt
-
Run the script:
python main.py
The script fetches data from the first 6 pages of TMDB and combines the results into a single CSV file.
You can customize the number of pages to scrape or adjust headers by editing the main.py script.
The combined movie data is saved as Combined_Data.csv in the project directory.
- CSV File: Contains the following columns:
- Title
- Rating
- Genre(s)
- Cast
Example output:
| Title | Rating | Genres | Cast |
|---|---|---|---|
| The Shawshank... | 9.3 | Drama, Crime | Tim Robbins, ... |
| The Godfather | 9.2 | Drama, Crime | Marlon Brando,... |
- Python
- Requests
- BeautifulSoup
- Pandas
Contributions are welcome! Please follow these steps:
- Fork the repository.
- Create a feature branch.
- Submit a pull request.
This project is licensed under the MIT License. See the LICENSE file for details.
- The Movie Database (TMDB) for providing the data.