Skip to content

Conversation

@ghkang98
Copy link
Contributor

@ghkang98 ghkang98 commented Jan 5, 2026

…ped or not alive on delete from command.

When a node is abnormal or dropped, the delete job function does not filter the replicas for these nodes, resulting in failure during task dispatch and causing the task to not execute properly during the Delete operation. On the other hand, after a task execution fails, it keeps retrying; however, it does not adequately consider that in certain scenarios a retry cannot resolve the issue and can only rely on the overall timeout of the outer task to terminate, which can lead to the task being stuck for a long time.

  • Test
    • Regression test
    • Unit Test
    • Manual test (add detailed scripts or steps below)

@Thearas
Copy link
Contributor

Thearas commented Jan 5, 2026

Thank you for your contribution to Apache Doris.
Don't know what should be done next? See How to process your PR.

Please clearly describe your PR:

  1. What problem was fixed (it's best to include specific error reporting information). How it was fixed.
  2. Which behaviors were modified. What was the previous behavior, what is it now, why was it modified, and what possible impacts might there be.
  3. What features were added. Why was this function added?
  4. Which code was refactored and why was this part of the code refactored?
  5. Which functions were optimized and what is the difference before and after the optimization?

@yiguolei
Copy link
Contributor

yiguolei commented Jan 6, 2026

run buildall

st = status;
public void addMark(K key, V value) {
synchronized (lock) {
marks.put(key, value);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

这里如果downLatch != null,需要报错,我们不能够支持await 之后,还变更mark的数量

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK,这里可以增加一个判断的,之所以让这个行为存在主要是为了兼容之前的写入,我们可以让使用者在事先就知道目标的count是多少,然后调用有count的构造函数在构建MarkedCountDownLatch时就初始化downLatch,如果可以不兼容这种情况那么是需要在这里限制这种行为的。

public synchronized void addMark(K key, V value) {
marks.put(key, value);
public long getCount() {
synchronized (lock) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

这个lock,跟直接在函数上加synchronized 修饰有什么区别?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

这里主要是考虑到了部分函数比如说wait不是锁整个method只是锁住代码块,为了整个类的风格统一所以在函数内使用代码块的锁

public MarkedCountDownLatch(int count) {
super(count);
this.markCount = count;
this.downLatch = new CountDownLatch(count);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

这个构造函数还存在的意义是什么?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

如果这个构造函数去不掉,就说明有的时候,我们的这个count 可能不等于marks的数量,可能是有问题的

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

这个构造函数和上面一样是为了减少改动量,兼容之前的一些使用方式

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants