Skip to content

[Enhancement][cache]Unify the enable/disable switch for File Cache on the FE side #59531

@wenzhenghu

Description

@wenzhenghu

2026-01-04 fe enable_file_cache和disable_file_cache对缓存读写操作影响分析.pdf

Search before asking

  • I had searched in the issues and found no similar issues.

Description

Operation Type Table Type Effect of enable_file_cache Effect of disable_file_cache
Write (Load) Internal Table None
(Ignored)
Controls Write-to-Cache
(True = Do not write to cache
False = Write to cache)
External Table None
(Ignored, writes directly to remote storage)
None
(Ignored, writes directly to remote storage)
Read (Query) External Table On/Off Switch
(True = Use cache
False = Read directly from remote)
Disposable Flag
(Used when cache is enabled. Determines if data is "one-time use")
Internal Table Ineffective
(Always uses cache if globally enabled in BE)
Eviction Policy Control
(True = Uses disposable queue [TTL/LRU]
False = Uses normal queue)

Solution

Suggested Optimization Solution

Provide enable_file_cache_olap_tables and enable_file_cache_external_catalogs as separate control switches for internal tables and data lake tables respectively.

• enable_file_cache_olap_tables: File cache switch for internal tables in storage-compute separation deployment mode (cloud_mode), with caching enabled by default.

•• If set to false, read operations enter the file cache's disposable queue (DISPOSABLE), and write operations bypass file cache, writing only to remote storage.

• enable_file_cache_external_catalogs: File cache switch for data lake tables, with caching disabled by default. Both read and write operations bypass the cache.

•• If set to true, read operations normally populate the file cache. Doris currently does not support writing to cache when writing data to data lake tables (to be discussed separately).

Why Two Switches?

This is due to the different characteristics of internal tables and data lake tables:

• Data Volume Difference: Compared to data lake tables like Hive, Iceberg, and Paimon, internal tables have relatively smaller data volumes. With proper cache space configuration, they can effectively cache most query hotspots. In general, data lake tables have data volumes that are at least one or two orders of magnitude larger than cache space, so caching can only be done on demand.

• Different Caching Purposes: For internal tables, caching is only available in storage-compute separation deployment mode. The purpose is to provide performance comparable to storage-compute integrated deployment mode, supporting business migration from storage-compute integrated/private deployment to storage-compute separation deployment, which also helps promote SelectDB's paid cloud services. For data lake tables, the goal is to accelerate performance as much as possible, but it's not expected to match internal table performance yet.

Due to these different characteristics, data lake table queries should default to disabling file cache to avoid reading large amounts of data that would pollute existing cache hotspots. For storage-compute separated internal tables, file cache should be enabled by default to provide high-performance internal table query services.

If only a single global session-level switch variable is provided, the following problems exist:
• Inefficient adaptation to scenarios where the same session needs to execute both storage-compute separated internal table queries and data lake queries, requiring frequent cache switch toggling.

• Unable to handle cases where a single query accesses both internal tables and data lake tables, as one switch cannot be both enabled and disabled simultaneously.

In summary, it is necessary to provide enable_file_cache_olap_tables and enable_file_cache_external_catalogs as separate cache behaviors for internal tables and data lake tables respectively.

Are you willing to submit PR?

  • Yes I am willing to submit a PR!

Code of Conduct

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions