pypi-info-scraper

The repository contains the script used for the automatic data collection necessary for an academic research paper on FPGA-related packages on the Python Package Index (PyPI).

The paper will be published in the 10th South-East Europe Design Automation, Computer Engineering, Computer Networks and Social Media Conference (SEEDA-CECNSM 2025).

The paper is titled "The State of Python-Based FPGA Development: A PyPI Repository Study".

Prerequisites

The script should be compatible with most operating systems.

Handling Anti-Scraping Measures

Since the Python Package Index (PyPI) has anti-scraping measures in place, a user challenge is necessary the first time the browser window is opened automatically from the Python script.

In that sense, when the script performs the "fpga" keyword search for the first page (page 1), user intervention is necessary. The user is given more time to perform the necessary actions.

Subsequent requests for the next pages of the search (page 2, 3, etc.) do not require further intervention. Additionally, no anti-scraping measures are required in the next stage of the process, where the JSON data for each package in the list is retrieved.

Output Format

The scraping results are saved in a structured Excel file.

The script was executed at the beginning of 2025 in order to acquire the information necessary for writing the aforementioned statistical review.

You can find the scraped data in pypi_packages.xlsx.

Utility and Significance

Basic statistics on packages hosted on PyPI can be gathered automatically by customizing this scraper.

This should aid future research of that nature.

The script is distributed under the MIT license.

Publication

The paper can be found on IEEE Xplore: TO BE INCLUDED

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
LICENSE		LICENSE
README.md		README.md
pypi_packages.xlsx		pypi_packages.xlsx
scraper.py		scraper.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

pypi-info-scraper

Prerequisites

Handling Anti-Scraping Measures

Output Format

Utility and Significance

Publication

About

Uh oh!

Releases

Packages

Languages

License

drifter1/pypi-info-scraper

Folders and files

Latest commit

History

Repository files navigation

pypi-info-scraper

Prerequisites

Handling Anti-Scraping Measures

Output Format

Utility and Significance

Publication

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages