[ENH] new data loaders, replace `yfinance` downloads #678

tschm · 2025-11-15T07:42:53Z

Get rid of yfinance. Create on csv file serving all notebooks. Closes #677

update from central main

tschm · 2025-11-15T09:05:09Z

@fkiraly this is good to be merged. Now using one flat file common for all notebooks. I have added the script to generate this file.

fkiraly

Great! I really think we need to replace downloads by onboard data.

I believe you have to explicitly request csv to be packaged in pyproject, or users will not be able to load the data. This becomes apparent in the wheels/release test (but we are not running a packaging test regularly).
could you do separate things in separate PR? E.g., linting the notebooks, changing dependencies, etc.
The pattern that I would use is adding a load_datasetname function instead of directly loading a csv. That makes the notebook easier to read.

Different topic, I think the pip installs should be removed from the notebooks, this messes up the VM that tests the notebooks.

fkiraly · 2025-11-15T10:21:33Z

Plus, could you please, please write descriptive summaries for your PR? Use AI if you need to.

tschm · 2025-11-15T10:43:40Z

I don't understand. Do you want to include the notebooks in the package released?

tschm · 2025-11-15T10:46:22Z

import os
if not os.path.isdir('data'):
    os.system('git clone https://github.com/pyportfolio/pyportfolioopt.git')
    os.chdir('PyPortfolioOpt/cookbook')

??? why would one have this in a notebook?

fkiraly · 2025-11-15T22:56:41Z

I don't understand. Do you want to include the notebooks in the package released?

no, I want loader functions for the csv in the package release, and the notebooks then import the loader from the package, rather than loading the csv directly.

from pypfopt.data import load_something

my_dummydata = load_something()

All the csv manipulation is hidden underneath, ensuring that the notebook is short and does not distract from what is being shown.

tschm · 2025-11-16T03:11:12Z

I don't understand. Do you want to include the notebooks in the package released?

no, I want loader functions for the csv in the package release, and the notebooks then import the loader from the package, rather than loading the csv directly.
from pypfopt.data import load_something

my_dummydata = load_something()
All the csv manipulation is hidden underneath, ensuring that the notebook is short and does not distract from what is being shown.

Could be done, but I would still not put the csv file into the package. I would do something like

load_prices("my_file.csv", ticker=["A","B","C"], start=1990-01-01)

fkiraly · 2025-11-16T12:37:01Z

Could be done, but I would still not put the csv file into the package.

How large are they? If the files are too large, we could cut them to be smaller? I think 1MB of example data is ok for a package. Imo it is a nice user experience to have some testing data shipped with a package for, testing or learning how to use it.

tschm · 2025-11-16T13:25:32Z

Could be done, but I would still not put the csv file into the package.

How large are they? If the files are too large, we could cut them to be smaller? I think 1MB of example data is ok for a package. Imo it is a nice user experience to have some testing data shipped with a package for, testing or learning how to use it.

More like 6.6 MB at the moment. We could change the examples and use the same tickers across... and maybe less history...

tschm · 2025-11-16T13:27:56Z

But this functionality would only be used by people "developing" the package. For users I don't see the point of having data in there. We would also be on thin ice legally as you can not "distribute" financial data :-) For that purpose I should also remove the last year from the data...

fkiraly · 2025-11-18T21:49:00Z

For users I don't see the point of having data in there

The point is running the examples, so users can play around with the python code and the data sets.

We would also be on thin ice legally as you can not "distribute" financial data :-) For that purpose I should also remove the last year from the data...

You are absolutely right! Thanks for pointing this out!!

How about we replace the downloader by some very simple simulated data then, to avoid any legal liabilities that go with using yfinance or data from it?

tschm · 2025-11-19T05:08:58Z

For users I don't see the point of having data in there

The point is running the examples, so users can play around with the python code and the data sets.

We would also be on thin ice legally as you can not "distribute" financial data :-) For that purpose I should also remove the last year from the data...

You are absolutely right! Thanks for pointing this out!!

How about we replace the downloader by some very simple simulated data then, to avoid any legal liabilities that go with using yfinance or data from it?

Ich denke, dass wir nicht päpstlicher als der Papst sein müssen. Ich würde einfach die Daten aus diesem Jahr löschen. Auch im yfinance package gibt es in den Test resourcen genug authentic files. Wir könnten auch die Ticker hashen, aber das wäre doch alles mühsam. Es wird sich niemand beschweren und falls doch, koennen wir gewiss reagieren.

fkiraly · 2025-11-25T09:20:48Z

(kindly asking to keep to English so other contributors can also read)

what is the action that you are suggesting?

Using a frozen extract but with the most recent data removed?
Not sure if this is fine with the license terms - the liability would be with @robertmartin8 or GC.OS, so I rather prefer to be on the very safe side.

Could we just use simulated data, or data where we know the license is ok?

tschm · 2025-11-25T15:52:44Z

can you please merge this. The tests of the notebooks you copied into main.yml are somewhat unstable. You need some of the moderate fixes

fkiraly

Sure, in-principle.

I need more details though to review.

can you please say what exactly the changes are to how the files are handled? What gets removed, what are we replacing the files by, exactly?
we are putting a "download on file execution" on download_prices, let's not do that. If we need to download something, it should happen as a call to an importable function.
there are changes in pyproject which seem unrelated or merge accidents.
could you also split off the ruff formatting of the notebooks into a separate PR, so that this one only contains changes related to the dataset handling?

tschm added 15 commits November 15, 2025 11:28

Merge pull request #2 from PyPortfolio/main

335c809

update from central main

store data extracted with yfinance in a flat file

a8d410f

notebooks without yfinance

1954bfc

spy prices

5454597

update spy_prices and notebooks

7182b09

only one big price file

e57a6d1

fix notebook 1

4b7abfe

fix notebook 2

d01b3b4

fix notebook 3,4,5

e32b242

fmt download_prices

522d5bc

fmt download_prices

e09606e

fmt download_prices

59eee8a

fmt download_prices

d11a44f

remove yfinance

ad36f9d

double plotly?

c19c936

ruff as dev dependency

6d50ec7

fkiraly requested changes Nov 15, 2025

View reviewed changes

fkiraly changed the title ~~No yfinance~~ [ENH] in notebooks, replace yfinance download with loaders Nov 15, 2025

Merge branch 'main' into No_yfinance

df8cfb0

tschm added 2 commits November 15, 2025 14:47

remove the obscure pip install commands from the notebooks

3373c6b

remove obsolete print

fbb3814

tschm mentioned this pull request Nov 16, 2025

Create loader for data #679

Open

fkiraly changed the title ~~[ENH] in notebooks, replace yfinance download with loaders~~ [ENH] new data loaders, replace yfinance downloads Nov 16, 2025

Merge branch 'PyPortfolio:main' into No_yfinance

7b9162c

Merge branch 'PyPortfolio:main' into No_yfinance

65794a3

fkiraly requested changes Nov 25, 2025

View reviewed changes

Merge branch 'PyPortfolio:main' into No_yfinance

583259f

[ENH] new data loaders, replace yfinance downloads #678

Are you sure you want to change the base?

[ENH] new data loaders, replace yfinance downloads #678

Uh oh!

Conversation

tschm commented Nov 15, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

tschm commented Nov 15, 2025

Uh oh!

fkiraly left a comment

Choose a reason for hiding this comment

Uh oh!

fkiraly commented Nov 15, 2025

Uh oh!

tschm commented Nov 15, 2025

Uh oh!

tschm commented Nov 15, 2025

Uh oh!

fkiraly commented Nov 15, 2025

Uh oh!

tschm commented Nov 16, 2025

Uh oh!

fkiraly commented Nov 16, 2025

Uh oh!

tschm commented Nov 16, 2025

Uh oh!

tschm commented Nov 16, 2025

Uh oh!

fkiraly commented Nov 18, 2025

Uh oh!

tschm commented Nov 19, 2025

Uh oh!

fkiraly commented Nov 25, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

tschm commented Nov 25, 2025

Uh oh!

fkiraly left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

[ENH] new data loaders, replace `yfinance` downloads #678

[ENH] new data loaders, replace `yfinance` downloads #678

tschm commented Nov 15, 2025 •

edited

Loading

fkiraly commented Nov 25, 2025 •

edited

Loading