-
Notifications
You must be signed in to change notification settings - Fork 75
fix: Add batching for visitors data loading for Matomo to avoid too-l… #666
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
fix: Add batching for visitors data loading for Matomo to avoid too-l… #666
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for reporting the issue related to long URLs! Just a minor comment
Btw, don't forget to rebase on master - some import errors in tests are fixed now (same goes for the other PR)
sources/matomo/__init__.py
Outdated
| @dlt.transformer( | ||
| data_from=get_last_visits, | ||
| write_disposition="merge", | ||
| name="visitors", | ||
| primary_key="visitorId", | ||
| ) | ||
| def get_unique_visitors_with_chunk( | ||
| visits: List[DictStrAny], | ||
| client: MatomoAPIClient, | ||
| site_id: int, | ||
| chunk_size: int = 20, | ||
| ) -> Iterator[TDataItem]: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since get_unique_visitors_with_chunk is a slightly modified version of get_unique_visitors, how about we just adjust the get_unique_visitors instead of having a new function that looks almost the same?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done, I planned to keep that function for anyone still wants to use that for full load (in case they have less data), but I guess they can always adjust the chunk size for their purpose
Note: Already rebased, there is no new update from master branch
sources/matomo/__init__.py
Outdated
| visits (List[DictStrAny]): List of dicts containing information on last visits in the given timeframe. | ||
| client (MatomoAPIClient): Used to make calls to Matomo API. | ||
| site_id (int): Every site in Matomo has a unique id. | ||
| chunk_size (int): Number of visitor IDs to process in each batch. Defaults to 100. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should probably be Defaults to 20, but this is irrelevant if we adjust the existing get_unique_visitors
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed
|
Pushed the fixes for the comments, already run "make lint", all passed |
|
those changes look good to me, but I am not sure why they actually solve the underlying issue. previoulsy there was always batching with 100 now its customizable. what am i missing? |
Hello, the current implementation never worked for me, if I understand correctly, it will fetch visitors' data and input to get visit data (please correct me if I'm wrong), and then I always got the "URL too long" issue due to too many visitors, that's why I updated this to ensure we can customize it, based on my tests, 20 worked well for me |
…ong-url issue
Tell us what you do here
Short description
Related Issues
Additional Context