Skip to content

HimanshuBhamaniya/IMDb-Web-scraping-using-selenium

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

5 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

🎬 IMDb Top 250 movies Web Scraper

This project demonstrates how to extract movie details from IMDb's Top 250 chart using Selenium and BeautifulSoup. The scraped data is processed and stored in a structured CSV file for further analysis.

🛠 Setup

Before running the project, set up a Python virtual environment and install dependencies:

#pip install selenium pandas numpy beautifulsoup4

Project Structure

o main.py: Uses Selenium to extract raw HTML of top 250 movie entries and saves each in a separate file inside the data/ folder.

o collect.py: Parses saved HTML files to extract movie metadata (Title, Year, Duration, Rating, Score, Link) and compiles the information into a CSV file imdb_top250_movies.csv.

Features

o Extracts:

 🎬 Title
 🗓 Year of release
 ⏱ Duration
 📛 Content rating
 ⭐ IMDb score
 🔗 Direct link to the movie page

o Saves HTML snapshots for offline inspection

o Robust error handling for parser failures

Output

The final result is stored in:

imdb_top250_movies.csv

About

web scraping using selenium and BeautifulSoup

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published