Skip to content

Commit d73f5ba

Browse files
committed
Merge branch 'master' into pricing-events
2 parents a2528ed + cb60a4e commit d73f5ba

File tree

5 files changed

+64
-27
lines changed

5 files changed

+64
-27
lines changed

CHANGELOG.md

Lines changed: 1 addition & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -2,15 +2,13 @@
22

33
All notable changes to this project will be documented in this file.
44

5-
<!-- git-cliff-unreleased-start -->
6-
## 3.0.3 - **not yet released**
5+
## [3.0.3](https://github.com/apify/apify-sdk-python/releases/tag/v3.0.3) (2025-10-21)
76

87
### 🐛 Bug Fixes
98

109
- Cache requests in RQ implementations by `id` ([#633](https://github.com/apify/apify-sdk-python/pull/633)) ([76886ce](https://github.com/apify/apify-sdk-python/commit/76886ce496165346a01f67e018547287c211ea54)) by [@Pijukatel](https://github.com/Pijukatel), closes [#630](https://github.com/apify/apify-sdk-python/issues/630)
1110

1211

13-
<!-- git-cliff-unreleased-end -->
1412
## [3.0.2](https://github.com/apify/apify-sdk-python/releases/tag/v3.0.2) (2025-10-17)
1513

1614
### 🐛 Bug Fixes

src/apify/storage_clients/_apify/_storage_client.py

Lines changed: 40 additions & 13 deletions
Original file line numberDiff line numberDiff line change
@@ -21,23 +21,50 @@
2121

2222
@docs_group('Storage clients')
2323
class ApifyStorageClient(StorageClient):
24-
"""Apify storage client."""
24+
"""Apify platform implementation of the storage client.
25+
26+
This storage client provides access to datasets, key-value stores, and request queues that persist data
27+
to the Apify platform. Each storage type is implemented with its own specific Apify client that stores data
28+
in the cloud, making it accessible from anywhere.
29+
30+
The communication with the Apify platform is handled via the Apify API client for Python, which is an HTTP API
31+
wrapper. For maximum efficiency and performance of the storage clients, various caching mechanisms are used to
32+
minimize the number of API calls made to the Apify platform. Data can be inspected and manipulated through
33+
the Apify console web interface or via the Apify API.
34+
35+
The request queue client supports two access modes controlled by the `request_queue_access` parameter:
36+
37+
### Single mode
38+
39+
The `single` mode is optimized for scenarios with only one consumer. It minimizes API calls, making it faster
40+
and more cost-efficient compared to the `shared` mode. This option is ideal when a single Actor is responsible
41+
for consuming the entire request queue. Using multiple consumers simultaneously may lead to inconsistencies
42+
or unexpected behavior.
43+
44+
In this mode, multiple producers can safely add new requests, but forefront requests may not be processed
45+
immediately, as the client relies on local head estimation instead of frequent forefront fetching. Requests can
46+
also be added or marked as handled by other clients, but they must not be deleted or modified, since such changes
47+
would not be reflected in the local cache. If a request is already fully cached locally, marking it as handled
48+
by another client will be ignored by this client. This does not cause errors but can occasionally result in
49+
reprocessing a request that was already handled elsewhere. If the request was not yet cached locally, marking
50+
it as handled poses no issue.
51+
52+
### Shared mode
53+
54+
The `shared` mode is designed for scenarios with multiple concurrent consumers. It ensures proper synchronization
55+
and consistency across clients, at the cost of higher API usage and slightly worse performance. This mode is safe
56+
for concurrent access from multiple processes, including Actors running in parallel on the Apify platform. It
57+
should be used when multiple consumers need to process requests from the same queue simultaneously.
58+
"""
2559

2660
def __init__(self, *, request_queue_access: Literal['single', 'shared'] = 'single') -> None:
27-
"""Initialize the Apify storage client.
61+
"""Initialize a new instance.
2862
2963
Args:
30-
request_queue_access: Controls the implementation of the request queue client based on expected scenario:
31-
- 'single' is suitable for single consumer scenarios. It makes less API calls, is cheaper and faster.
32-
- 'shared' is suitable for multiple consumers scenarios at the cost of higher API usage.
33-
Detailed constraints for the 'single' access type:
34-
- Only one client is consuming the request queue at the time.
35-
- Multiple producers can put requests to the queue, but their forefront requests are not guaranteed to
36-
be handled so quickly as this client does not aggressively fetch the forefront and relies on local
37-
head estimation.
38-
- Requests are only added to the queue, never deleted by other clients. (Marking as handled is ok.)
39-
- Other producers can add new requests, but not modify existing ones.
40-
(Modifications would not be included in local cache)
64+
request_queue_access: Defines how the request queue client behaves. Use `single` mode for a single
65+
consumer. It has fewer API calls, meaning better performance and lower costs. If you need multiple
66+
concurrent consumers use `shared` mode, but expect worse performance and higher costs due to
67+
the additional overhead.
4168
"""
4269
self._request_queue_access = request_queue_access
4370

src/apify/storage_clients/_smart_apify/_storage_client.py

Lines changed: 16 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -19,10 +19,18 @@
1919

2020
@docs_group('Storage clients')
2121
class SmartApifyStorageClient(StorageClient):
22-
"""SmartApifyStorageClient that delegates to cloud_storage_client or local_storage_client.
22+
"""Storage client that automatically selects cloud or local storage client based on the environment.
2323
24-
When running on Apify platform use cloud_storage_client, else use local_storage_client. This storage client is
25-
designed to work specifically in Actor context.
24+
This storage client provides access to datasets, key-value stores, and request queues by intelligently
25+
delegating to either the cloud or local storage client based on the execution environment and configuration.
26+
27+
When running on the Apify platform (which is detected via environment variables), this client automatically
28+
uses the `cloud_storage_client` to store storage data there. When running locally, it uses the
29+
`local_storage_client` to store storage data there. You can also force cloud storage usage from your
30+
local machine by using the `force_cloud` argument.
31+
32+
This storage client is designed to work specifically in `Actor` context and provides a seamless development
33+
experience where the same code works both locally and on the Apify platform without any changes.
2634
"""
2735

2836
def __init__(
@@ -31,13 +39,13 @@ def __init__(
3139
cloud_storage_client: ApifyStorageClient | None = None,
3240
local_storage_client: StorageClient | None = None,
3341
) -> None:
34-
"""Initialize the Apify storage client.
42+
"""Initialize a new instance.
3543
3644
Args:
37-
cloud_storage_client: Client used to communicate with the Apify platform storage. Either through
38-
`force_cloud` argument when opening storages or automatically when running on the Apify platform.
39-
local_storage_client: Client used to communicate with the storage when not running on the Apify
40-
platform and not using `force_cloud` argument when opening storages.
45+
cloud_storage_client: Storage client used when an Actor is running on the Apify platform, or when
46+
explicitly enabled via the `force_cloud` argument. Defaults to `ApifyStorageClient`.
47+
local_storage_client: Storage client used when an Actor is not running on the Apify platform and when
48+
`force_cloud` flag is not set. Defaults to `FileSystemStorageClient`.
4149
"""
4250
self._cloud_storage_client = cloud_storage_client or ApifyStorageClient(request_queue_access='single')
4351
self._local_storage_client = local_storage_client or ApifyFileSystemStorageClient()

website/docusaurus.config.js

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -239,6 +239,10 @@ module.exports = {
239239
url: 'https://crawlee.dev/python/api/class/FileSystemStorageClient',
240240
group: 'Storage clients',
241241
},
242+
{
243+
url: 'https://crawlee.dev/python/api/class/SqlStorageClient',
244+
group: 'Storage clients',
245+
},
242246
// Request loaders
243247
{
244248
url: 'https://crawlee.dev/python/api/class/RequestLoader',

website/package-lock.json

Lines changed: 3 additions & 3 deletions
Some generated files are not rendered by default. Learn more about customizing how changed files appear on GitHub.

0 commit comments

Comments
 (0)