Skip to content

Conversation

@NHNguyen94
Copy link

…ong-url issue

Tell us what you do here

  • Add batching function for loading visitors data to avoid too-long-url issue
  • Still maintain the current function for those who might need

Short description

Related Issues

Additional Context

@anuunchin anuunchin self-assigned this Oct 30, 2025
Copy link
Collaborator

@anuunchin anuunchin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for reporting the issue related to long URLs! Just a minor comment

Btw, don't forget to rebase on master - some import errors in tests are fixed now (same goes for the other PR)

Comment on lines 226 to 237
@dlt.transformer(
data_from=get_last_visits,
write_disposition="merge",
name="visitors",
primary_key="visitorId",
)
def get_unique_visitors_with_chunk(
visits: List[DictStrAny],
client: MatomoAPIClient,
site_id: int,
chunk_size: int = 20,
) -> Iterator[TDataItem]:
Copy link
Collaborator

@anuunchin anuunchin Nov 17, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since get_unique_visitors_with_chunk is a slightly modified version of get_unique_visitors, how about we just adjust the get_unique_visitors instead of having a new function that looks almost the same?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done, I planned to keep that function for anyone still wants to use that for full load (in case they have less data), but I guess they can always adjust the chunk size for their purpose

Note: Already rebased, there is no new update from master branch

visits (List[DictStrAny]): List of dicts containing information on last visits in the given timeframe.
client (MatomoAPIClient): Used to make calls to Matomo API.
site_id (int): Every site in Matomo has a unique id.
chunk_size (int): Number of visitor IDs to process in each batch. Defaults to 100.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should probably be Defaults to 20, but this is irrelevant if we adjust the existing get_unique_visitors

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed

@NHNguyen94
Copy link
Author

Pushed the fixes for the comments, already run "make lint", all passed

@djudjuu
Copy link
Contributor

djudjuu commented Nov 24, 2025

those changes look good to me, but I am not sure why they actually solve the underlying issue. previoulsy there was always batching with 100 now its customizable. what am i missing?

@NHNguyen94
Copy link
Author

those changes look good to me, but I am not sure why they actually solve the underlying issue. previoulsy there was always batching with 100 now its customizable. what am i missing?

Hello, the current implementation never worked for me, if I understand correctly, it will fetch visitors' data and input to get visit data (please correct me if I'm wrong), and then I always got the "URL too long" issue due to too many visitors, that's why I updated this to ensure we can customize it, based on my tests, 20 worked well for me

@NHNguyen94 NHNguyen94 requested a review from anuunchin November 24, 2025 10:42
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Too-long-URL issue when loading visits / visitors data

3 participants