1000 VUs causes `connect() failed (111: Connection refused)` from Nginx

## Problem

Having:

```bash
export PGRSTBENCH_SEPARATE_PG="false" # this is just to make the deployment faster
export PGRSTBENCH_EC2_INSTANCE_TYPE="t3a.micro" # needs micro at minimum otherwise at nano the middleware instance becomes unresponsive

$ postgrest-bench-deploy
...
```

Using 1000 VUs for a k6 load test causes many failures:

```bash
$ postgrest-bench-k6 1000 k6/POSTBulk.js 

time="2025-02-10T23:45:17Z" level=warning msg="Request Failed" error="Post \"http://pgrst/employee?columns=employee_id,first_name,last_name,title,reports_to,birth_date,hire_date,address,city,state,country,postal_code,phone,fax,email\": write tcp 10.0.0.132:36880->10.0.0.250:80: use of closed network connection"
time="2025-02-11T00:04:40Z" level=warning msg="Request Failed" error="Post \"http://pgrst/employee?columns=employee_id,first_name,last_name,title,reports_to,birth_date,hire_date,address,
city,state,country,postal_code,phone,fax,email\": read tcp 10.0.0.132:52914->10.0.0.250:80: read: connection reset by peer"

..

     data_received..................: 4.4 MB 129 kB/s
     data_sent......................: 198 MB 5.7 MB/s
     http_req_blocked...............: avg=43.35ms  min=1.12µs   med=3.7µs    max=11.24s   p(90)=40.87ms  p(95)=92.11ms
     http_req_connecting............: avg=43.19ms  min=0s       med=0s       max=11.24s   p(90)=40.31ms  p(95)=91.21ms
     http_req_duration..............: avg=767.83ms min=0s       med=953.54ms max=4.4s     p(90)=1.24s    p(95)=1.57s  
       { expected_response:true }...: avg=1.02s    min=10.53ms  med=993.78ms max=4.4s     p(90)=1.35s    p(95)=1.72s  
   ✗ http_req_failed................: 25.35% ✓ 18776       ✗ 55283 
     http_req_receiving.............: avg=1.4ms    min=0s       med=40.54µs  max=120.22ms p(90)=403.47µs p(95)=8.3ms  
     http_req_sending...............: avg=2.43ms   min=0s       med=45.23µs  max=452.1ms  p(90)=2.86ms   p(95)=10.87ms
     http_req_tls_handshaking.......: avg=0s       min=0s       med=0s       max=0s       p(90)=0s       p(95)=0s     
     http_req_waiting...............: avg=763.99ms min=0s       med=952.48ms max=4.4s     p(90)=1.23s    p(95)=1.57s  
     http_reqs......................: 37030  1077.252069/s
     iteration_duration.............: avg=816.73ms min=812.71µs med=956.25ms max=11.3s    p(90)=1.26s    p(95)=1.63s  
     iterations.....................: 37029  1077.222978/s
     vus............................: 0      min=0         max=1000
     vus_max........................: 1000   min=1000      max=1000
```

No error is logged on `journalctl -u postgrest`. But on Nginx, we can see a lot of `connect() failed (111: Connection refused)` errors:

```
$ postgrest-bench-ssh pgrst
$ cat /var/log/nginx/error.log | less

2025/02/10 23:37:53 [error] 1301#1301: *15884 connect() failed (111: Connection refused) while connecting to upstream, client: 10.0.0.132, server: , request: "POST /employee?columns=employee_id,first_name,last_name,title,reports_to,birth_date,hire_date,address,city,state,country,postal_code,phone,fax,email HTTP/1.1", upstream: "http://[::1]:3000/employee?columns=employee_id,first_name,last_name,title,reports_to,birth_date,hire_date,address,city,state,country,postal_code,phone,fax,email", host: "pgrst"
...
```

Not sure if PostgREST or Nginx is at fault here.

## Observations

- At 800 VUs, there's no error:

```
$ postgrest-bench-k6 800 k6/POSTBulk.js 

Running k6 with 800 vus
Pseudo-terminal will not be allocated because stdin is not a terminal.
Pseudo-terminal will not be allocated because stdin is not a terminal.

     data_received..................: 4.3 MB 134 kB/s
     data_sent......................: 188 MB 5.9 MB/s
     http_req_blocked...............: avg=6.35ms   min=1.08µs  med=2.8µs    max=416.84ms p(90)=4.47µs   p(95)=10.6µs  
     http_req_connecting............: avg=6.27ms   min=0s      med=0s       max=416.79ms p(90)=0s       p(95)=0s      
     http_req_duration..............: avg=896.09ms min=12.59ms med=845.24ms max=12.15s   p(90)=1.48s    p(95)=2.3s    
       { expected_response:true }...: avg=896.09ms min=12.59ms med=845.24ms max=12.15s   p(90)=1.48s    p(95)=2.3s    
   ✓ http_req_failed................: 0.00%  ✓ 0          ✗ 53375
     http_req_receiving.............: avg=70.81µs  min=21.02µs med=45.5µs   max=22.49ms  p(90)=81.92µs  p(95)=109.59µs
     http_req_sending...............: avg=490.92µs min=24.07µs med=49.92µs  max=76.33ms  p(90)=102.82µs p(95)=315.1µs 
     http_req_tls_handshaking.......: avg=0s       min=0s      med=0s       max=0s       p(90)=0s       p(95)=0s      
     http_req_waiting...............: avg=895.53ms min=12.47ms med=845.12ms max=12.14s   p(90)=1.48s    p(95)=2.3s    
     http_reqs......................: 26688  836.207664/s
     iteration_duration.............: avg=903.03ms min=13.08ms med=845.99ms max=12.28s   p(90)=1.48s    p(95)=2.34s   
     iterations.....................: 26687  836.176331/s
     vus............................: 0      min=0        max=800
     vus_max........................: 800    min=800      max=800
```

- By using unix socket instead of TCP, the failure rate goes down somehow (16% from the previous 25%):


```
$ export PGRSTBENCH_WITH_UNIX_SOCKET="false"
$ postgrest-bench-deploy

$ postgrest-bench-k6 1000 k6/POSTBulk.js 

     data_received..................: 4.3 MB 135 kB/s
     data_sent......................: 193 MB 6.0 MB/s
     http_req_blocked...............: avg=25.87ms  min=1.09µs  med=3.05µs  max=11.24s   p(90)=36.15ms  p(95)=78.73ms 
     http_req_connecting............: avg=25.69ms  min=0s      med=0s      max=11.24s   p(90)=35.73ms  p(95)=77.85ms 
     http_req_duration..............: avg=903.86ms min=0s      med=1.02s   max=12.3s    p(90)=1.34s    p(95)=1.76s   
       { expected_response:true }...: avg=1.07s    min=14.32ms med=1.04s   max=12.3s    p(90)=1.46s    p(95)=1.88s   
   ✗ http_req_failed................: 16.22% ✓ 10494       ✗ 54195 
     http_req_receiving.............: avg=998.27µs min=0s      med=42.96µs max=169.44ms p(90)=110.52µs p(95)=655.59µs
     http_req_sending...............: avg=1.8ms    min=0s      med=46.79µs max=278.75ms p(90)=325.06µs p(95)=8.9ms   
     http_req_tls_handshaking.......: avg=0s       min=0s      med=0s      max=0s       p(90)=0s       p(95)=0s      
     http_req_waiting...............: avg=901.05ms min=0s      med=1.02s   max=12.29s   p(90)=1.34s    p(95)=1.76s   
     http_reqs......................: 32345  1003.925992/s
     iteration_duration.............: avg=933.73ms min=685.4µs med=1.02s   max=12.31s   p(90)=1.35s    p(95)=1.78s   
     iterations.....................: 32344  1003.894954/s
     vus............................: 0      min=0         max=1000
     vus_max........................: 1000   min=1000      max=1000
```

- Raising the VUs to 1200, adds a new Nginx error:

  ```
  2025/02/11 00:15:19 [alert] 40240#40240: 1024 worker_connections are not enough
  2025/02/11 00:15:19 [alert] 40240#40240: *36991 1024 worker_connections are not enough while connecting to upstream, client: 10.0.0.132, server: , request: "POST /employee?columns=employ ee_id,first_name,last_name,title,reports_to,birth_date,hire_date,address,city,state,country,postal_code,phone,fax,email HTTP/1.1", upstream: "http://127.0.0.1:3000/employee?columns=emplo yee_id,first_name,last_name,title,reports_to,birth_date,hire_date,address,city,state,country,postal_code,phone,fax,email", host: "pgrst"
  2025/02/11 00:15:30 [error] 40240#40240: *38310 connect() failed (111: Connection refused) while connecting to upstream, client: 10.0.0.132, server: , request: "POST /employee?columns=em ployee_id,first_name,last_name,title,reports_to,birth_date,hire_date,address,city,state,country,postal_code,phone,fax,email HTTP/1.1", upstream: "http://[::1]:3000/employee?columns=emplo yee_id,first_name,last_name,title,reports_to,birth_date,hire_date,address,city,state,country,postal_code,phone,fax,email", host: "pgrst"
  ```

    + The 1024 worker_connection are set here: https://github.com/PostgREST/postgrest-benchmark/blob/master/nixops.nix#L182
    + So it could be that 1000 VUs are too close to Nginx max worker_connections and that causes failures.

- Increasing the instance size (e.g. to `m5a.4xlarge` ) reduces the failure rate to 0.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

1000 VUs causes `connect() failed (111: Connection refused)` from Nginx #4

Problem

Observations

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

1000 VUs causes connect() failed (111: Connection refused) from Nginx #4

Description

Problem

Observations

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions

1000 VUs causes `connect() failed (111: Connection refused)` from Nginx #4