🎬 IMDb Top 250 movies Web Scraper

This project demonstrates how to extract movie details from IMDb's Top 250 chart using Selenium and BeautifulSoup. The scraped data is processed and stored in a structured CSV file for further analysis.

🛠 Setup

Before running the project, set up a Python virtual environment and install dependencies:

#pip install selenium pandas numpy beautifulsoup4

Project Structure

o main.py: Uses Selenium to extract raw HTML of top 250 movie entries and saves each in a separate file inside the data/ folder.

o collect.py: Parses saved HTML files to extract movie metadata (Title, Year, Duration, Rating, Score, Link) and compiles the information into a CSV file imdb_top250_movies.csv.

Features

o Extracts:

 🎬 Title
 🗓 Year of release
 ⏱ Duration
 📛 Content rating
 ⭐ IMDb score
 🔗 Direct link to the movie page

o Saves HTML snapshots for offline inspection

o Robust error handling for parser failures

Output

The final result is stored in:

imdb_top250_movies.csv

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
data		data
venv		venv
README.md		README.md
collect.py		collect.py
imdb_top250_movies.csv		imdb_top250_movies.csv
imdb_top250_movies.xlsx		imdb_top250_movies.xlsx
main.py		main.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

🎬 IMDb Top 250 movies Web Scraper

🛠 Setup

Project Structure

Features

Output

About

Uh oh!

Releases

Packages

Languages

HimanshuBhamaniya/IMDb-Web-scraping-using-selenium

Folders and files

Latest commit

History

Repository files navigation

🎬 IMDb Top 250 movies Web Scraper

🛠 Setup

Project Structure

Features

Output

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages