Skip to content

Conversation

@tschm
Copy link
Contributor

@tschm tschm commented Nov 15, 2025

Get rid of yfinance. Create on csv file serving all notebooks. Closes #677

@tschm
Copy link
Contributor Author

tschm commented Nov 15, 2025

@fkiraly this is good to be merged. Now using one flat file common for all notebooks. I have added the script to generate this file.

Copy link
Collaborator

@fkiraly fkiraly left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great! I really think we need to replace downloads by onboard data.

  • I believe you have to explicitly request csv to be packaged in pyproject, or users will not be able to load the data. This becomes apparent in the wheels/release test (but we are not running a packaging test regularly).
  • could you do separate things in separate PR? E.g., linting the notebooks, changing dependencies, etc.
  • The pattern that I would use is adding a load_datasetname function instead of directly loading a csv. That makes the notebook easier to read.

Different topic, I think the pip installs should be removed from the notebooks, this messes up the VM that tests the notebooks.

@fkiraly fkiraly changed the title No yfinance [ENH] in notebooks, replace yfinance download with loaders Nov 15, 2025
@fkiraly
Copy link
Collaborator

fkiraly commented Nov 15, 2025

Plus, could you please, please write descriptive summaries for your PR? Use AI if you need to.

@tschm
Copy link
Contributor Author

tschm commented Nov 15, 2025

I don't understand. Do you want to include the notebooks in the package released?

@tschm
Copy link
Contributor Author

tschm commented Nov 15, 2025

import os
if not os.path.isdir('data'):
    os.system('git clone https://github.com/pyportfolio/pyportfolioopt.git')
    os.chdir('PyPortfolioOpt/cookbook')

??? why would one have this in a notebook?

@fkiraly
Copy link
Collaborator

fkiraly commented Nov 15, 2025

I don't understand. Do you want to include the notebooks in the package released?

no, I want loader functions for the csv in the package release, and the notebooks then import the loader from the package, rather than loading the csv directly.

from pypfopt.data import load_something

my_dummydata = load_something()

All the csv manipulation is hidden underneath, ensuring that the notebook is short and does not distract from what is being shown.

@tschm
Copy link
Contributor Author

tschm commented Nov 16, 2025

I don't understand. Do you want to include the notebooks in the package released?

no, I want loader functions for the csv in the package release, and the notebooks then import the loader from the package, rather than loading the csv directly.

from pypfopt.data import load_something

my_dummydata = load_something()

All the csv manipulation is hidden underneath, ensuring that the notebook is short and does not distract from what is being shown.

Could be done, but I would still not put the csv file into the package. I would do something like

load_prices("my_file.csv", ticker=["A","B","C"], start=1990-01-01)

@tschm tschm mentioned this pull request Nov 16, 2025
@fkiraly
Copy link
Collaborator

fkiraly commented Nov 16, 2025

Could be done, but I would still not put the csv file into the package.

How large are they? If the files are too large, we could cut them to be smaller? I think 1MB of example data is ok for a package. Imo it is a nice user experience to have some testing data shipped with a package for, testing or learning how to use it.

@fkiraly fkiraly changed the title [ENH] in notebooks, replace yfinance download with loaders [ENH] new data loaders, replace yfinance downloads Nov 16, 2025
@tschm
Copy link
Contributor Author

tschm commented Nov 16, 2025

Could be done, but I would still not put the csv file into the package.

How large are they? If the files are too large, we could cut them to be smaller? I think 1MB of example data is ok for a package. Imo it is a nice user experience to have some testing data shipped with a package for, testing or learning how to use it.

More like 6.6 MB at the moment. We could change the examples and use the same tickers across... and maybe less history...

@tschm
Copy link
Contributor Author

tschm commented Nov 16, 2025

But this functionality would only be used by people "developing" the package. For users I don't see the point of having data in there. We would also be on thin ice legally as you can not "distribute" financial data :-) For that purpose I should also remove the last year from the data...

@fkiraly
Copy link
Collaborator

fkiraly commented Nov 18, 2025

For users I don't see the point of having data in there

The point is running the examples, so users can play around with the python code and the data sets.

We would also be on thin ice legally as you can not "distribute" financial data :-) For that purpose I should also remove the last year from the data...

You are absolutely right! Thanks for pointing this out!!

How about we replace the downloader by some very simple simulated data then, to avoid any legal liabilities that go with using yfinance or data from it?

@tschm
Copy link
Contributor Author

tschm commented Nov 19, 2025

For users I don't see the point of having data in there

The point is running the examples, so users can play around with the python code and the data sets.

We would also be on thin ice legally as you can not "distribute" financial data :-) For that purpose I should also remove the last year from the data...

You are absolutely right! Thanks for pointing this out!!

How about we replace the downloader by some very simple simulated data then, to avoid any legal liabilities that go with using yfinance or data from it?

Ich denke, dass wir nicht päpstlicher als der Papst sein müssen. Ich würde einfach die Daten aus diesem Jahr löschen. Auch im yfinance package gibt es in den Test resourcen genug authentic files. Wir könnten auch die Ticker hashen, aber das wäre doch alles mühsam. Es wird sich niemand beschweren und falls doch, koennen wir gewiss reagieren.

@fkiraly
Copy link
Collaborator

fkiraly commented Nov 25, 2025

(kindly asking to keep to English so other contributors can also read)

what is the action that you are suggesting?

Using a frozen extract but with the most recent data removed?
Not sure if this is fine with the license terms - the liability would be with @robertmartin8 or GC.OS, so I rather prefer to be on the very safe side.

Could we just use simulated data, or data where we know the license is ok?

@tschm
Copy link
Contributor Author

tschm commented Nov 25, 2025

can you please merge this. The tests of the notebooks you copied into main.yml are somewhat unstable. You need some of the moderate fixes

Copy link
Collaborator

@fkiraly fkiraly left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, in-principle.

I need more details though to review.

  • can you please say what exactly the changes are to how the files are handled? What gets removed, what are we replacing the files by, exactly?
  • we are putting a "download on file execution" on download_prices, let's not do that. If we need to download something, it should happen as a call to an importable function.
  • there are changes in pyproject which seem unrelated or merge accidents.
  • could you also split off the ruff formatting of the notebooks into a separate PR, so that this one only contains changes related to the dataset handling?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[ENH] Remove yfinance from notebooks

2 participants