-
Notifications
You must be signed in to change notification settings - Fork 1.1k
[ENH] new data loaders, replace yfinance downloads
#678
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
update from central main
|
@fkiraly this is good to be merged. Now using one flat file common for all notebooks. I have added the script to generate this file. |
fkiraly
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great! I really think we need to replace downloads by onboard data.
- I believe you have to explicitly request
csvto be packaged inpyproject, or users will not be able to load the data. This becomes apparent in the wheels/release test (but we are not running a packaging test regularly). - could you do separate things in separate PR? E.g., linting the notebooks, changing dependencies, etc.
- The pattern that I would use is adding a
load_datasetnamefunction instead of directly loading acsv. That makes the notebook easier to read.
Different topic, I think the pip installs should be removed from the notebooks, this messes up the VM that tests the notebooks.
yfinance download with loaders
|
Plus, could you please, please write descriptive summaries for your PR? Use AI if you need to. |
|
I don't understand. Do you want to include the notebooks in the package released? |
??? why would one have this in a notebook? |
no, I want loader functions for the csv in the package release, and the notebooks then import the loader from the package, rather than loading the from pypfopt.data import load_something
my_dummydata = load_something()All the csv manipulation is hidden underneath, ensuring that the notebook is short and does not distract from what is being shown. |
Could be done, but I would still not put the csv file into the package. I would do something like load_prices("my_file.csv", ticker=["A","B","C"], start=1990-01-01) |
How large are they? If the files are too large, we could cut them to be smaller? I think 1MB of example data is ok for a package. Imo it is a nice user experience to have some testing data shipped with a package for, testing or learning how to use it. |
yfinance download with loadersyfinance downloads
More like 6.6 MB at the moment. We could change the examples and use the same tickers across... and maybe less history... |
|
But this functionality would only be used by people "developing" the package. For users I don't see the point of having data in there. We would also be on thin ice legally as you can not "distribute" financial data :-) For that purpose I should also remove the last year from the data... |
The point is running the examples, so users can play around with the python code and the data sets.
You are absolutely right! Thanks for pointing this out!! How about we replace the downloader by some very simple simulated data then, to avoid any legal liabilities that go with using |
Ich denke, dass wir nicht päpstlicher als der Papst sein müssen. Ich würde einfach die Daten aus diesem Jahr löschen. Auch im yfinance package gibt es in den Test resourcen genug authentic files. Wir könnten auch die Ticker hashen, aber das wäre doch alles mühsam. Es wird sich niemand beschweren und falls doch, koennen wir gewiss reagieren. |
|
(kindly asking to keep to English so other contributors can also read) what is the action that you are suggesting? Using a frozen extract but with the most recent data removed? Could we just use simulated data, or data where we know the license is ok? |
|
can you please merge this. The tests of the notebooks you copied into main.yml are somewhat unstable. You need some of the moderate fixes |
fkiraly
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure, in-principle.
I need more details though to review.
- can you please say what exactly the changes are to how the files are handled? What gets removed, what are we replacing the files by, exactly?
- we are putting a "download on file execution" on
download_prices, let's not do that. If we need to download something, it should happen as a call to an importable function. - there are changes in
pyprojectwhich seem unrelated or merge accidents. - could you also split off the
ruffformatting of the notebooks into a separate PR, so that this one only contains changes related to the dataset handling?
Get rid of yfinance. Create on csv file serving all notebooks. Closes #677