Skip to content

[Suspected BUG] noprog_timeout is triggered too early #531

@ZeejayYang

Description

@ZeejayYang

Dear sir,

I'm using the GQUIC protocol and set both noprog_timeout and idle_timeout to 90 seconds. The data flow sends a data heartbeat every 60 seconds.
I'm encountering a problem. The log is as follows.

11:27:15.623 [QUIC:...DE39] conn: maybe_set_noprogress_alarm 3748 update last_tick: 34195475981963
11:27:15.623 [QUIC:...DE39] conn: set AL_IDLE new 34195475982050; exp 34195565976031 diff 89993981      // my debug log for set AL_IDLE alarm
11:27:15.624 [QUIC:...DE39] conn: maybe_set_noprogress_alarm 3748 update last_tick: 34195475982127
11:28:16.262 [QUIC:...DE39-3]                       stream: update last progress to 34195475982127      // update last_prog: 34195475982127 -> 11:27:15.624
11:28:16.263 [QUIC:...DE39] conn: maybe_set_noprogress_alarm 3748 update last_tick: 34195536621053      // update last_tick to now
11:28:16.434 [QUIC:...DE39] conn: set AL_IDLE new 34195536792569; exp 34195565982127 diff 29189558      // set last_prog(11:27:15.624) +90s to AL_IDLE alarm
11:28:16.434 [QUIC:...DE39] conn: maybe_set_noprogress_alarm 3748 update last_tick: 34195536792713      // update last_tick to now   11:28:16.434
11:28:16:003 [QUIC:...DE39-3] stream: stream_readf: read 412 bytes, read offset 10627, reached fin: 0
11:28:16.434 [QUIC:...DE39-3]                       stream: update last progress to 34195536792713      // update last_prog          11:28:16.434
11:28:16.460 [QUIC:...DE39] conn: maybe_set_noprogress_alarm 3748 update last_tick: 34195536818291      // but not update AL_IDLE alarm (bug?)
11:28:45.624 [QUIC:...DE39] conn: maybe_set_noprogress_alarm 3748 update last_tick: 34195565982305
11:28:45.624 [QUIC:...DE39] conn: connection timed out      // AL_IDLE timeout
11:28:45.624 [QUIC:...DE39] event: connection timed out

My debug log code:

#define lsquic_alarmset_set(alarmset, al_id, exp) do {                  \
    (alarmset)->as_armed_set |= 1 << (al_id);                           \
    (alarmset)->as_expiry[al_id] = exp;                                 \
    if (al_id == AL_IDLE) {(alarmset)->as_now[al_id] = lsquic_time_now(); \
        LSQ_DEBUG("set AL_IDLE new %"PRIu64"; exp %"PRIu64" diff %"PRIu64, (alarmset)->as_now[al_id],      \
        exp, exp - (alarmset)->as_now[al_id]);      \
    }   \
} while (0)

// ... 


/* This should be called before lsquic_alarmset_ring_expired() */
static void
maybe_set_noprogress_alarm (struct full_conn *conn, lsquic_time_t now)
{
    lsquic_time_t exp;

    if (conn->fc_flags & FC_NOPROG_TIMEOUT)
    {
        if (conn->fc_pub.last_tick)
        {
            exp = conn->fc_pub.last_prog + conn->fc_enpub->enp_noprog_timeout;
            if (!lsquic_alarmset_is_set(&conn->fc_alset, AL_IDLE)
                                    || exp < conn->fc_alset.as_expiry[AL_IDLE])
                lsquic_alarmset_set(&conn->fc_alset, AL_IDLE, exp);
            if (conn->fc_pub.last_tick != now)
                LSQ_DEBUG("maybe_set_noprogress_alarm %d update last_tick: %"PRIu64, __LINE__, now);
            conn->fc_pub.last_tick = now;
        }
        else
        {
            conn->fc_pub.last_tick = now;
            conn->fc_pub.last_prog = now;
        }
    }
}

In my use case, although last_prog was eventually updated, the AL_IDLE timer was not updated; instead, AL_IDLE was set to an older value.
I analyzed functions maybe_set_noprogress_alarm and maybe_update_last_progress, and found that their handling of updating last_prog and setting the AL_IDLE timer is rather disjointed. Therefore, this might be a bug?

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions