Skip to content

ETL Project using Selenium & Scrapy. Automating extracting data from vivanuncios.com.mx, then loading it into a sqlite database using python and SQL.

Notifications You must be signed in to change notification settings

cesalomx/Vivanuncios-ETL-Web-Scraping

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

19 Commits
 
 
 
 

Repository files navigation

ETL Vivanuncios.com.mx Pipeline

The main purpose of this project is to develop a sqlite database storing data extracted from vivanuncios.com.mx, specificaly Querétaro. The first step is to extract specific data by doing some web-scraping using selenium & scrapy and then, as a second step, append it to a sqlite database by running a pipeline.

Installation

In order to run this spider, it is mandatory to have the following libraries installed:

pip install scrapy
pip install scrapy_selenium

Files

  • homes.py: Main python file which contains the code to fetch the data from vivanuncios.com.mx
  • pipeline.py: A simple function to execute the ETL process in homes.py
  • settings.py: This file contains only settings considered important or commonly used.

To run the spider:

scrapy crawl homes

database_output_image

About

ETL Project using Selenium & Scrapy. Automating extracting data from vivanuncios.com.mx, then loading it into a sqlite database using python and SQL.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages