- 
                Notifications
    
You must be signed in to change notification settings  - Fork 747
 
Fix WebSocket transport None race condition in proxy #6241
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
Add a transport validity check before WebSocket upgrade to prevent AssertionError when clients disconnect during handshake. The issue occurs when a client connection is lost between the API state check and server.prepare() call, causing request.transport to become None and triggering "assert transport is not None" in aiohttp's _pre_start(). The fix detects the closed connection early and raises HTTPBadRequest with a clear reason instead of crashing with an AssertionError. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>
07b7bba    to
    1cd499b      
    Compare
  
    | 
           Can be reproduced easily here using the following reproducer: #!/usr/bin/env python3
"""Minimal reproducer for WebSocket transport None AssertionError.
This minimal reproducer focuses on the simplest way to trigger the race condition:
- Connect to /core/websocket
- Cancel the connection immediately after starting
- Repeat 20 times to hit the narrow timing window
The bug: request.transport becomes None between check_api_state() and server.prepare()
"""
import asyncio
import aiohttp
import os
import sys
async def main():
    """Minimal reproducer - just rapid connect/cancel cycles."""
    supervisor_host = os.getenv('SUPERVISOR', 'supervisor')
    supervisor_token = os.getenv('SUPERVISOR_TOKEN')
    if not supervisor_token:
        print("ERROR: Set SUPERVISOR_TOKEN environment variable")
        sys.exit(1)
    print(f"Connecting to: {supervisor_host}")
    print("Starting rapid connect/cancel test (20 attempts)...")
    print("Monitor logs: docker logs -f hassio_supervisor | grep -E 'AssertionError|transport|WebSocket'\n")
    headers = {'Authorization': f'Bearer {supervisor_token}'}
    for attempt in range(20):
        try:
            # Create a connection but close it immediately
            # This simulates network interruption during handshake
            async with aiohttp.ClientSession(headers=headers) as session:
                # Start WebSocket connection
                ws_task = asyncio.create_task(
                    session.ws_connect(
                        f"http://{supervisor_host}/core/websocket",
                        timeout=aiohttp.ClientTimeout(total=1.0)
                    )
                )
                # Give it a tiny moment to start, then cancel
                await asyncio.sleep(0.001)  # Very short delay
                ws_task.cancel()
                try:
                    await ws_task
                except asyncio.CancelledError:
                    pass
            print(f"  Attempt {attempt}: Connection cancelled")
        except Exception as e:
            print(f"  Attempt {attempt}: Error - {type(e).__name__}: {e}")
        # Small delay between attempts
        await asyncio.sleep(0.01)
    print("\n✅ Completed 20 attempts")
    print("\nCheck for AssertionError:")
    print("  docker logs hassio_supervisor | tail -100 | grep -A 5 AssertionError")
    print("\nOr check for fix working:")
    print("  docker logs hassio_supervisor | tail -100 | grep 'WebSocket connection lost'")
if __name__ == "__main__":
    try:
        asyncio.run(main())
    except KeyboardInterrupt:
        print("\n\nInterrupted") | 
    
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not to throw in confusion, a genuine question: wouldn't it be safer/earlier to also take WebSocketResponse.prepare() in consideration? It is here where the handshake and writing to the transport starts, if I am correct.
If we're thinking in high-traffic systems, the connection might drop right between the prepare and request.transport. If my analysis is correct, then it might be worth wrapping the call or add a targeted test to confirm it behaves as graceful as you solve the request.transport.
          
 Hm, yeah that is a good point. The  What we could do is catch assertions (it is probably fine to assume that any assertion is probably connection related):   try:
      await server.prepare(request)
  except AssertionError as err:
      # Handle transport becoming None during prepare()
      raise web.HTTPBadRequest(reason="Connection closed")I wonder if using assertion on aiohttp side is really the right approach 🤔 @bdraco thoughts?  | 
    
| 
           Thanks for taking it in consideration. Other than assertion, a   | 
    
          
 Wouldn't   | 
    
| 
           I've made a nasty typo whilst typing on the smartphone. You are correct. ;)  | 
    
Proposed change
This PR fixes a race condition in the WebSocket proxy that causes
AssertionError: assert transport is not Nonecrashes when clients disconnect during the WebSocket handshake.The issue occurs when a client connection is lost in the window between the Home Assistant API state check and the
server.prepare(request)call. When this happens,request.transportbecomesNone, causing aiohttp's internal WebSocket upgrade code to hit an assertion failure atweb_ws.py:317.On Sentry we see ~7k events daily, limited to ~90 users. Something on those systems seems to constantly poke the WebSocket API unsuccessfully.
The fix adds a transport validity check before attempting the WebSocket upgrade. If the connection is already closed (
request.transport is None), we log a warning and raiseHTTPBadRequestwith a clear reason instead of crashing with an unhandledAssertionError.Stack trace before fix:
Behavior after fix:
HTTPBadRequestresponse with reason "Connection closed"Type of change
Additional information
Checklist
ruff format supervisor tests)If API endpoints or add-on configuration are added/changed: