The repository contains the script used for the automatic data collection necessary for an academic research paper on FPGA-related packages on the Python Package Index (PyPI).
The paper will be published in the 10th South-East Europe Design Automation, Computer Engineering, Computer Networks and Social Media Conference (SEEDA-CECNSM 2025).
The paper is titled "The State of Python-Based FPGA Development: A PyPI Repository Study".
The script should be compatible with most operating systems.
Since the Python Package Index (PyPI) has anti-scraping measures in place, a user challenge is necessary the first time the browser window is opened automatically from the Python script.
In that sense, when the script performs the "fpga" keyword search for the first page (page 1), user intervention is necessary. The user is given more time to perform the necessary actions.
Subsequent requests for the next pages of the search (page 2, 3, etc.) do not require further intervention. Additionally, no anti-scraping measures are required in the next stage of the process, where the JSON data for each package in the list is retrieved.
The scraping results are saved in a structured Excel file.
The script was executed at the beginning of 2025 in order to acquire the information necessary for writing the aforementioned statistical review.
You can find the scraped data in pypi_packages.xlsx.
Basic statistics on packages hosted on PyPI can be gathered automatically by customizing this scraper.
This should aid future research of that nature.
The script is distributed under the MIT license.
The paper can be found on IEEE Xplore: TO BE INCLUDED