Skip to content

[Bug] filestatus not cached #7192

@flaming-archer

Description

@flaming-archer

Code of Conduct

Search before asking

  • I have searched in the issues and found no similar issues.

Describe the bug

The previous filestatus will not be cached, as its source hivetable will be created every time, and the filestatus will also be created, resulting in different client IDs for the cached object's key, leading to cache invalidation.

It can cause two problems:

  1. Cache failure, which greatly affects query performance because obtaining filestatus from HDFS is a slow process.
  2. Memory leakage occurs because objects are constantly added to the cache, and since the default cache time is -1, the memory will become larger and larger.
  3. Because the cache has no cache, Spark will restart the job query, which will consume more time. If there are over 1000 partitions, this job will start over 1000 tasks

Affects Version(s)

master

Kyuubi Server Log Output

Kyuubi Engine Log Output

Kyuubi Server Configurations

Kyuubi Engine Configurations

Additional context

No response

Are you willing to submit PR?

  • Yes. I would be willing to submit a PR with guidance from the Kyuubi community to fix.
  • No. I cannot submit a PR at this time.

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions