Skip to content

fix: fixed error handling in asgi application lifespan #2352

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 8 commits into
base: main
Choose a base branch
from

Conversation

marcm-ml
Copy link

Description

Fixes exception handling in ASGI application startup. The previous implementation had the side-effect that exception within startup (i.e. broker failure), would cause the underlying ASGI server to get stuck waiting for either the "lifespan.startup.complete" or "lifespan.startup.failed" event. In this state, the ASGI server would ignore all signals (like SIGINT) and becomes unresponsive.

Fixes #2305

Type of change

Please delete options that are not relevant.

  • Documentation (typos, code examples, or any documentation updates)
  • Bug fix (a non-breaking change that resolves an issue)
  • New feature (a non-breaking change that adds functionality)
  • Breaking change (a fix or feature that would disrupt existing functionality)
  • This change requires a documentation update

Checklist

  • My code adheres to the style guidelines of this project (scripts/lint.sh shows no errors)
  • I have conducted a self-review of my own code
  • My changes do not generate any new warnings
  • I have added tests to validate the effectiveness of my fix or the functionality of my new feature
  • Both new and existing unit tests pass successfully on my local environment by running scripts/test-cov.sh
  • I have ensured that static analysis tests are passing by running scripts/static-analysis.sh

@CLAassistant
Copy link

CLAassistant commented Jul 13, 2025

CLA assistant check
All committers have signed the CLA.

@marcm-ml
Copy link
Author

When running tests on my computer, I get unrelated errors. I assume these are flaky tests? However, the new tests run perfectly fine.

@marcm-ml
Copy link
Author

@Lancetnik feel free to review whenever you find the time. This should fix any of the issues discussed in the linked issue. Thanks

Copy link
Collaborator

@Lancetnik Lancetnik left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added tests doesn't proof related changes. All new tests are green with the old code as well

@marcm-ml
Copy link
Author

marcm-ml commented Jul 13, 2025

Ah okay, i might understand the problem now.
The behaviour of the TestClient and running things via uvicorn is different.
When I run the example script from the issue with uvicorn, the uvicorn server lives forever with the old code and exists immediately with the new code. In both cases the faststream app indicates a shutdown.
If I replace the uvicorn server with the TestClient, I get an immediate exit with the old and new code.

from time import sleep

from faststream.asgi import AsgiFastStream
from faststream.nats.broker import NatsBroker

broker = NatsBroker()
app = AsgiFastStream(broker)


@app.on_startup
async def setup():
    raise ValueError()


if __name__ == "__main__":
    from starlette.testclient import TestClient

    with TestClient(app, base_url="http://localhost:8888") as client:
        sleep(100)

This leaves me with the question, how I could test this properly. Perhaps, I could add a test case in the CLI test suite which uses uvicorn under the hood? Do you have other ideas? I think the current tests are useless as you mentioned

@Lancetnik
Copy link
Collaborator

This leaves me with the question, how I could test this properly. Perhaps, I could add a test case in the CLI test suite which uses uvicorn under the hood? Do you have other ideas? I think the current tests are useless as you mentioned

Thank you!
I hope, @Sehat1137 will help us here. He is a codeowner of ASGI CLI support.

@marcm-ml
Copy link
Author

I have added a better test that now fails with the old code and succeeds with the new code.
Funny enough, when using lifespan, the startup is correctly aborted.

from contextlib import asynccontextmanager

from faststream.asgi import AsgiFastStream
from faststream.nats.broker import NatsBroker

broker = NatsBroker()


@asynccontextmanager
async def lifespan():
    raise ValueError("Lifespan error")
    yield


app = AsgiFastStream(broker, lifespan=lifespan)

if __name__ == "__main__":
    import uvicorn

    uvicorn.run(app)

@marcm-ml
Copy link
Author

marcm-ml commented Jul 13, 2025

@Sehat1137 do you know why the test_run test failed? When I am running this locally (and devcontainer), all tests pass

Edit:
also now that a broker failure will crash the ASGI startup, some tests fail since the broker cannot establish a proper connection. This was not a problem before as the broker startup was done after the lifespan startup event has been sent. More specifically, since the NatsBroker tries to reconnect 60 times each waiting for 2 seconds, the startup is super slow in this case. Before the ASGI app became immediately available but the broker did not.

Is this something that breaks the current behaviour of the ASGI app in your eyes @Lancetnik ?

@Sehat1137 Sehat1137 self-requested a review July 13, 2025 19:33
@Sehat1137
Copy link
Collaborator

Please, rebase your branch

@marcm-ml
Copy link
Author

Hey, I have added the tests in tests/cli/rabbit/test_app.py. I had to call uvicorn directly to make sure I get access to the uvicron server instance to check if the appropiate lifespan hook has been called.

Additionally, I had to refactor how tests/cli/test_run.py was setup. Please take a look if you are happy with the result.

@marcm-ml marcm-ml requested a review from Sehat1137 July 14, 2025 11:51
@Sehat1137
Copy link
Collaborator

@marcm-ml
Previously, this worked without using mocks.
If you had to edit existing tests, you most likely broke backward compatibility.
@Lancetnik what we can do with it?

@marcm-ml
Copy link
Author

marcm-ml commented Jul 14, 2025

also now that a broker failure will crash the ASGI startup, some tests fail since the broker cannot establish a proper connection. This was not a problem before as the broker startup was done after the lifespan startup event has been sent. More specifically, since the NatsBroker tries to reconnect 60 times each waiting for 2 seconds, the startup is super slow in this case. Before the ASGI app became immediately available but the broker did not.

I tried to explain the reason previously here. I think it was not apparent that the broker setup in this specific test was always unsuccessful since the broker startup was handled after asgi startup.

My original issue was created exactly due to this behaviour where a broker setup was not successful but the asgi server kept running.

Edit:
If you replace the NatsBroker in tests/cli/test_run.py with RabbitBroker, you will get failing tests due to connection issues with the current main branch.

@Sehat1137
Copy link
Collaborator

@marcm-ml
Yes, now I understand better what the problem is. Thank you, I will write a little later when I think about how to solve this situation

@Sehat1137
Copy link
Collaborator

@marcm-ml
I tried to run the simple application if nats is not available. The application freezes and cannot be stopped using ctrl+c.

Previously, such behavior was not observed. But this actual only for NatsBroker.

You may try to repeat this case

from faststream import FastStream
from faststream.nats import NatsBroker  # Not ok
from faststream.rabbit import RabbitBroker  # ok
from faststream.redis import RedisBroker  # ok
from faststream.kafka import KafkaBroker  # ok


broker = NatsBroker()

app = FastStream(broker)

@marcm-ml
Copy link
Author

for me every broker is failing. Nats is appears stuck because it has a high reconnect attempt setting. if you set max_reconnect_attempts=1 than nats fails instantly for me. Redis is also stuck for me. Kafka and Rabbit exit instantly.

@marcm-ml
Copy link
Author

marcm-ml commented Jul 14, 2025

Maybe to clarify:
I get errors for all broker in main and in my code with your example.

The behaviour is expected that ctrl+c does not work in my code but works in main branch. We are now awaiting the startup and not sending it off as a background task using anyio task group. During the startup lifespan, uvicorn traps any signal (like ctrl+c / SIGINT) until startup is done. since the startup is only done after the broker has been setup, we cannot abort the startup. It is unfortunate that nats default reconnect parameter is set as high (60*2s = 2mins), which means u need to wait 2 mins to exit. I argue that this was the reason the current implementation was setup in that way since it appears that the application is unresponsive. I would argue that this is a issue with nats implementation rather than the ASGI one.

I think it is up to you guys to decide what kind of behaviour you would want to see here
@Lancetnik @Sehat1137

Edit:

You can see the same behaviour without using fastream. Run via uvicorn script:app and try to press CTRL+C

import asyncio
from contextlib import asynccontextmanager

from fastapi import FastAPI

@asynccontextmanager
async def lifespan(app):
    await asyncio.sleep(10) # this is equivalent to nats broker trying to connect
    yield

app = FastAPI(lifespan=lifespan)

@Sehat1137
Copy link
Collaborator

@marcm-ml
Maybe you can think about how to solve this problem without breaking the current behavior? If you don't want, you may create an issue and we try to solve this in near releases

@marcm-ml
Copy link
Author

I believe the current implementation is fundamentally broken and not how an ASGI application should be started. If you want to retain the current behaviour, you must make sure that the self._startup does not block. Then you need to figure out how to shutdown the ASGI server when an exception on self._startup was raised. For me this makes things more difficult since the broker setup should be part of the ASGI setup.
On a different note: Regarding semantic versioning, your are still in an early development phase (0.x.x) which should indicate to users that they need to expect breaking changes.

@Lancetnik
Copy link
Collaborator

@marcm-ml sorry, I didn't investigate for the problem too close before

I can't agree, that broker startup error should fails ASGI runtime

from faststream.asgi import AsgiFastStream, AsgiResponse, get
from faststream.nats import NatsBroker

@get
async def index(scope):
    return AsgiResponse(b"Hello, World!", 200)

broker = NatsBroker()
app = AsgiFastStream(broker, asgi_routes=[("/test", index)])

The following code process /test endpoint correctly even if NatsBroker is not connected – so I can decide, that ASGI works fine and we shouldn't fail. Broker connection is in another application scope and should be checked by another mechanism (like probes)

But, lifespan errors should fail the app according ASGI specification. I suggest to split your Issue into two:

  1. Fail ASGI startup if broker is not available – I think, we should reject it
  2. Fail ASGI statup if lifespan error occured – this is the bug indeed and we should fix it

@Lancetnik
Copy link
Collaborator

On a different note: Regarding semantic versioning, your are still in an early development phase (0.x.x) which should indicate to users that they need to expect breaking changes.

It is not correct. We follow semantic versioning, so in our case it means that versions under 1.0.0 has a majority in 0.X release. Therefore we can't break compatibility in patches like 0.5.* – all breaking changes should be made in next (or even later) major 0.6.0 for now.

Additionally, we have many users and a large community, so we cannot break compatibility without a deprecation phase. Any breaking changes must have the following lifespan:

  1. New feature introduction
  2. Old syntax must still be supported, deprecated, and mapped to the new feature. We must support both versions (new and old) for at least one major release
  3. After that, we can remove the old syntax with a deprecation notice

@Lancetnik
Copy link
Collaborator

The problem is, that we call on_startup, broker.start and after_startup as a single call here:

https://github.com/ag2ai/faststream/blob/main/faststream/_internal/application.py#L168-L180

And we move this call to asyncio task in ASGI – https://github.com/ag2ai/faststream/blob/main/faststream/asgi/app.py#L189

This reason, the following code doesn't fail ASGI application startup

@app.on_startup
async def setup():
    raise ValueError()

To fix it we should separate on_startup from original start method and call it is a part of ASGI lifespan. But, broker.start and after_startup should stay in task

@marcm-ml
Copy link
Author

marcm-ml commented Jul 15, 2025

On a different note: Regarding semantic versioning, your are still in an early development phase (0.x.x) which should indicate to users that they need to expect breaking changes.

It is not correct. We follow semantic versioning, so in our case it means that versions under 1.0.0 has a majority in 0.X release. Therefore we can't break compatibility in patches like 0.5.* – all breaking changes should be made in next (or even later) major 0.6.0 for now.

Additionally, we have many users and a large community, so we cannot break compatibility without a deprecation phase. Any breaking changes must have the following lifespan:

  1. New feature introduction

  2. Old syntax must still be supported, deprecated, and mapped to the new feature. We must support both versions (new and old) for at least one major release

  3. After that, we can remove the old syntax with a deprecation notice

Point 4. of the semantic versioning spec states that version 0.x is initial development and subject to any change.

https://semver.org/#spec-item-4

@Lancetnik
Copy link
Collaborator

Point 4. of the semantic versioning spec states that version 0.x is initial development and subject to any change.

semver.org

Yeah, I had read the specification. Anyway, we conclude to use minor patches as major until 1.0 release. So, we introduce any minor features & patches as patch releases

@marcm-ml
Copy link
Author

I am confused that broker startup should not be part of the ASGI startup.

But if you do not want to change that behaviour, it is fine with me.
You can close the PR if you want. Thanks for your time 👍

@Lancetnik
Copy link
Collaborator

I am confused that broker startup should not be part of the ASGI startup.

But if you do not want to change that behaviour, it is fine with me. You can close the PR if you want. Thanks for your time

Thank you for your time and effort in investigating the issue. You have done a great job, and I apologize for not clarifying things sooner. I attempted to clarify my thoughts and concerns, but we can still continue to discuss them. This is just my perspective.

In any case, the original issue was regarding

@app.on_startup
async def setup():
    raise ValueError()

which does not fail the application. The issue still exists and needs to be addressed, so I do not see any reason for closing the issue at this time. But, if you don't want to work on it, you can close the PR. Thank you and sorry one more time.

@marcm-ml
Copy link
Author

I can look into it in a few days again. Will let you know 👍

@Sehat1137
Copy link
Collaborator

@marcm-ml Hi, any updates?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Bug: Faststream does not handle exception in ASGI startup lifespan correctly
4 participants